Deep Agents
AgentOrchestratorRetrievalText2SQLToolbox

Teachings Generator

Generate domain-specific teachings from database schema

The TeachingsGenerator analyzes your database schema to automatically generate domain-specific teachings - structured knowledge items that improve SQL generation accuracy. These teachings include vocabulary, patterns, guardrails, and examples grounded in your actual schema.

Basic Usage

import { TeachingsGenerator } from '@deepagents/text2sql/synthesis';

const generator = new TeachingsGenerator(adapter, {
  context: 'E-commerce database tracking orders and inventory'
});

const teachings = await generator.generate();
// Returns 3-10 high-value teachables (terms, hints, guardrails, etc.)

How It Works

TeachingsGenerator examines your database schema to create relevant, high-impact teachables:

  1. Introspects schema (tables, columns, relationships, constraints)
  2. Analyzes table/column names to infer domain vocabulary
  3. Identifies potential guardrails (large tables, sensitive data)
  4. Detects common patterns and relationships
  5. Generates 3-10 teachables prioritizing:
    • Guardrails for safety and performance
    • Clarifications for ambiguous terms
    • Hints for best practices
    • Domain terminology

The generator produces teachings that are automatically added to your Text2SQL instance to guide query generation.

Configuration Options

interface TeachingsGeneratorOptions {
  /** Additional domain context to guide generation */
  context?: string;
  /** Model to use for generation */
  model?: AgentModel;
}

Domain Context

Provide business context to improve teaching quality:

// Without context - relies purely on schema
const teachings = await new TeachingsGenerator(adapter).generate();

// With context - generates more relevant teachings
const teachings = await new TeachingsGenerator(adapter, {
  context: `Our database tracks e-commerce orders.
  - Active customers are those who ordered in the last 90 days
  - We exclude test accounts with email ending in @test.com
  - Revenue should exclude cancelled and refunded orders`
}).generate();

Custom Model

Override the default model:

import { groq } from '@ai-sdk/groq';

const teachings = await new TeachingsGenerator(adapter, {
  context: 'Healthcare patient records',
  model: groq('llama-3.3-70b-versatile')
}).generate();

Retry Handling

The generate() method includes automatic retry logic for transient errors:

// Retries up to 3 times (default)
const teachings = await generator.generate();

// Custom retry count
const teachings = await generator.generate(5);

Retryable errors include:

  • Parse errors (malformed output)
  • Schema validation failures
  • "No object generated" errors
  • AI provider errors (rate limits, timeouts)

Non-retryable errors (thrown immediately):

  • Authentication failures
  • Invalid schema format
  • Network connection errors

Using with SchemaSynthesizer

Generate training pairs with domain knowledge:

import {
  TeachingsGenerator,
  SchemaSynthesizer,
  PersonaGenerator,
  toPairs
} from '@deepagents/text2sql/synthesis';

// Generate teachings
const teachingsGen = new TeachingsGenerator(adapter, {
  context: 'SaaS subscription and usage tracking database'
});
const teachings = await teachingsGen.generate();

// Generate personas
const personaGen = new PersonaGenerator(adapter, { count: 5 });
const personas = await personaGen.generate();

// Generate pairs with teachings and personas
const pairs = await toPairs(new SchemaSynthesizer(adapter, {
  count: 10,
  complexity: 'medium',
  personas: personas,
  teachings: teachings  // Guides SQL generation
}));

Example Output

For an e-commerce database, generated teachings might include:

[
  {
    type: 'guardrail',
    rule: 'Avoid unbounded scans on orders table - always include date range filter',
    reason: 'Performance - orders table contains millions of rows',
    action: 'Ask user for timeframe if not specified'
  },
  {
    type: 'hint',
    text: 'Exclude test accounts with email ending in @test.com from all metrics'
  },
  {
    type: 'term',
    name: 'active customer',
    definition: 'customer who placed an order in the last 90 days'
  },
  {
    type: 'clarification',
    when: 'user asks for "revenue"',
    ask: 'Do you want gross revenue or net revenue (excluding refunds/cancellations)?',
    reason: 'Revenue can mean different things for different analyses'
  },
  {
    type: 'quirk',
    issue: 'order_total includes shipping and tax',
    workaround: 'Use order_subtotal for product revenue only'
  },
  {
    type: 'example',
    question: 'show top selling products',
    answer: `SELECT p.product_name, COUNT(oi.order_id) as order_count
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.status NOT IN ('cancelled', 'refunded')
  AND o.created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY p.product_id, p.product_name
ORDER BY order_count DESC
LIMIT 10`,
    note: 'Excludes cancelled/refunded orders and focuses on recent history'
  }
]

Integration with Text2SQL

Use text2sql.teach() for automatic teaching (recommended):

// Automatic: generates and applies teachings
await text2sql.teach(
  'Our database tracks e-commerce orders. Active customers ordered in last 90 days.'
);

// Manual: generate teachings separately
const generator = new TeachingsGenerator(adapter, {
  context: 'E-commerce database'
});
const teachings = await generator.generate();

// Apply manually
text2sql.instruct(...teachings);

The automatic teach() method handles:

  • Schema introspection
  • Teaching generation
  • Application to Text2SQL instance
  • Formatting for system prompt

Full Pipeline Example

Generate comprehensive training dataset with teachings:

import {
  TeachingsGenerator,
  PersonaGenerator,
  SchemaSynthesizer,
  DepthEvolver,
  BreadthEvolver,
  toPairs
} from '@deepagents/text2sql/synthesis';

// 1. Generate teachings
const teachingsGen = new TeachingsGenerator(adapter, {
  context: `Financial services database.
  - NPL = non-performing loan (90+ days past due)
  - Always exclude test accounts (account_type = 'test')
  - Basis points: 1% = 100 bps`
});
const teachings = await teachingsGen.generate();

// 2. Generate personas
const personaGen = new PersonaGenerator(adapter, { count: 8 });
const personas = await personaGen.generate();

// 3. Generate base pairs
const basePairs = await toPairs(new SchemaSynthesizer(adapter, {
  count: 5,
  complexity: ['low', 'medium', 'hard'],
  personas: personas,
  teachings: teachings
}));
console.log(`Base pairs: ${basePairs.length}`);

// 4. Evolve in depth (make questions harder)
const depthEvolved = await toPairs(
  new DepthEvolver(basePairs, adapter, {
    count: 2  // 2 harder versions per question
  })
);
console.log(`Depth evolved: ${depthEvolved.length}`);

// 5. Evolve in breadth (paraphrase)
const breadthEvolved = await toPairs(
  new BreadthEvolver([...basePairs, ...depthEvolved], {
    count: 3  // 3 paraphrases per question
  })
);
console.log(`Breadth evolved: ${breadthEvolved.length}`);

const allPairs = [...basePairs, ...depthEvolved, ...breadthEvolved];
console.log(`Total dataset: ${allPairs.length} pairs`);

Teaching Types Generated

The generator produces these teachable types (in priority order):

  1. Guardrails - Safety and performance boundaries

    • Large table scan warnings
    • Sensitive data protection
    • Query complexity limits
  2. Hints - Always-apply rules

    • Test account exclusions
    • Default date ranges
    • Preferred join patterns
  3. Clarifications - When to ask users for more info

    • Ambiguous metrics (revenue, active, conversion)
    • Time period specifications
    • Aggregation level preferences
  4. Terms - Domain vocabulary

    • Business acronyms and jargon
    • Domain-specific concepts
    • Canonical values and their meanings
  5. Examples - Common query patterns

    • Frequently asked questions
    • Complex join patterns
    • Aggregation examples
  6. Quirks - Data edge cases and workarounds

    • Column format issues
    • Calculation gotchas
    • Schema anomalies

Best Practices

  1. Provide rich context - The more domain context you provide, the better the teachings
  2. Review generated teachings - Spot-check output to ensure relevance and accuracy
  3. Combine with manual teachings - Auto-generated + manually curated = comprehensive
  4. Iterate - Regenerate teachings as your schema evolves
  5. Use retry parameter - Set higher retry count for production pipelines

Comparison: Manual vs Generated

AspectManual TeachingGenerated Teaching
PrecisionHigh - you define exact rulesMedium - inferred from schema
CoverageLow - only what you writeMedium - covers common patterns
EffortHigh - requires domain expertiseLow - automated
Best ForBusiness-critical rules, edge casesBootstrapping, domain vocabulary

Recommendation: Use both. Generate teachings for baseline knowledge, then supplement with manual teachings for critical business rules and edge cases.