Teachings Generator
Generate domain-specific teachings from database schema
The TeachingsGenerator analyzes your database schema to automatically generate domain-specific teachings - structured knowledge items that improve SQL generation accuracy. These teachings include vocabulary, patterns, guardrails, and examples grounded in your actual schema.
Basic Usage
import { TeachingsGenerator } from '@deepagents/text2sql/synthesis';
const generator = new TeachingsGenerator(adapter, {
context: 'E-commerce database tracking orders and inventory'
});
const teachings = await generator.generate();
// Returns 3-10 high-value teachables (terms, hints, guardrails, etc.)How It Works
TeachingsGenerator examines your database schema to create relevant, high-impact teachables:
- Introspects schema (tables, columns, relationships, constraints)
- Analyzes table/column names to infer domain vocabulary
- Identifies potential guardrails (large tables, sensitive data)
- Detects common patterns and relationships
- Generates 3-10 teachables prioritizing:
- Guardrails for safety and performance
- Clarifications for ambiguous terms
- Hints for best practices
- Domain terminology
The generator produces teachings that are automatically added to your Text2SQL instance to guide query generation.
Configuration Options
interface TeachingsGeneratorOptions {
/** Additional domain context to guide generation */
context?: string;
/** Model to use for generation */
model?: AgentModel;
}Domain Context
Provide business context to improve teaching quality:
// Without context - relies purely on schema
const teachings = await new TeachingsGenerator(adapter).generate();
// With context - generates more relevant teachings
const teachings = await new TeachingsGenerator(adapter, {
context: `Our database tracks e-commerce orders.
- Active customers are those who ordered in the last 90 days
- We exclude test accounts with email ending in @test.com
- Revenue should exclude cancelled and refunded orders`
}).generate();Custom Model
Override the default model:
import { groq } from '@ai-sdk/groq';
const teachings = await new TeachingsGenerator(adapter, {
context: 'Healthcare patient records',
model: groq('llama-3.3-70b-versatile')
}).generate();Retry Handling
The generate() method includes automatic retry logic for transient errors:
// Retries up to 3 times (default)
const teachings = await generator.generate();
// Custom retry count
const teachings = await generator.generate(5);Retryable errors include:
- Parse errors (malformed output)
- Schema validation failures
- "No object generated" errors
- AI provider errors (rate limits, timeouts)
Non-retryable errors (thrown immediately):
- Authentication failures
- Invalid schema format
- Network connection errors
Using with SchemaSynthesizer
Generate training pairs with domain knowledge:
import {
TeachingsGenerator,
SchemaSynthesizer,
PersonaGenerator,
toPairs
} from '@deepagents/text2sql/synthesis';
// Generate teachings
const teachingsGen = new TeachingsGenerator(adapter, {
context: 'SaaS subscription and usage tracking database'
});
const teachings = await teachingsGen.generate();
// Generate personas
const personaGen = new PersonaGenerator(adapter, { count: 5 });
const personas = await personaGen.generate();
// Generate pairs with teachings and personas
const pairs = await toPairs(new SchemaSynthesizer(adapter, {
count: 10,
complexity: 'medium',
personas: personas,
teachings: teachings // Guides SQL generation
}));Example Output
For an e-commerce database, generated teachings might include:
[
{
type: 'guardrail',
rule: 'Avoid unbounded scans on orders table - always include date range filter',
reason: 'Performance - orders table contains millions of rows',
action: 'Ask user for timeframe if not specified'
},
{
type: 'hint',
text: 'Exclude test accounts with email ending in @test.com from all metrics'
},
{
type: 'term',
name: 'active customer',
definition: 'customer who placed an order in the last 90 days'
},
{
type: 'clarification',
when: 'user asks for "revenue"',
ask: 'Do you want gross revenue or net revenue (excluding refunds/cancellations)?',
reason: 'Revenue can mean different things for different analyses'
},
{
type: 'quirk',
issue: 'order_total includes shipping and tax',
workaround: 'Use order_subtotal for product revenue only'
},
{
type: 'example',
question: 'show top selling products',
answer: `SELECT p.product_name, COUNT(oi.order_id) as order_count
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.status NOT IN ('cancelled', 'refunded')
AND o.created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY p.product_id, p.product_name
ORDER BY order_count DESC
LIMIT 10`,
note: 'Excludes cancelled/refunded orders and focuses on recent history'
}
]Integration with Text2SQL
Use text2sql.teach() for automatic teaching (recommended):
// Automatic: generates and applies teachings
await text2sql.teach(
'Our database tracks e-commerce orders. Active customers ordered in last 90 days.'
);
// Manual: generate teachings separately
const generator = new TeachingsGenerator(adapter, {
context: 'E-commerce database'
});
const teachings = await generator.generate();
// Apply manually
text2sql.instruct(...teachings);The automatic teach() method handles:
- Schema introspection
- Teaching generation
- Application to Text2SQL instance
- Formatting for system prompt
Full Pipeline Example
Generate comprehensive training dataset with teachings:
import {
TeachingsGenerator,
PersonaGenerator,
SchemaSynthesizer,
DepthEvolver,
BreadthEvolver,
toPairs
} from '@deepagents/text2sql/synthesis';
// 1. Generate teachings
const teachingsGen = new TeachingsGenerator(adapter, {
context: `Financial services database.
- NPL = non-performing loan (90+ days past due)
- Always exclude test accounts (account_type = 'test')
- Basis points: 1% = 100 bps`
});
const teachings = await teachingsGen.generate();
// 2. Generate personas
const personaGen = new PersonaGenerator(adapter, { count: 8 });
const personas = await personaGen.generate();
// 3. Generate base pairs
const basePairs = await toPairs(new SchemaSynthesizer(adapter, {
count: 5,
complexity: ['low', 'medium', 'hard'],
personas: personas,
teachings: teachings
}));
console.log(`Base pairs: ${basePairs.length}`);
// 4. Evolve in depth (make questions harder)
const depthEvolved = await toPairs(
new DepthEvolver(basePairs, adapter, {
count: 2 // 2 harder versions per question
})
);
console.log(`Depth evolved: ${depthEvolved.length}`);
// 5. Evolve in breadth (paraphrase)
const breadthEvolved = await toPairs(
new BreadthEvolver([...basePairs, ...depthEvolved], {
count: 3 // 3 paraphrases per question
})
);
console.log(`Breadth evolved: ${breadthEvolved.length}`);
const allPairs = [...basePairs, ...depthEvolved, ...breadthEvolved];
console.log(`Total dataset: ${allPairs.length} pairs`);Teaching Types Generated
The generator produces these teachable types (in priority order):
-
Guardrails - Safety and performance boundaries
- Large table scan warnings
- Sensitive data protection
- Query complexity limits
-
Hints - Always-apply rules
- Test account exclusions
- Default date ranges
- Preferred join patterns
-
Clarifications - When to ask users for more info
- Ambiguous metrics (revenue, active, conversion)
- Time period specifications
- Aggregation level preferences
-
Terms - Domain vocabulary
- Business acronyms and jargon
- Domain-specific concepts
- Canonical values and their meanings
-
Examples - Common query patterns
- Frequently asked questions
- Complex join patterns
- Aggregation examples
-
Quirks - Data edge cases and workarounds
- Column format issues
- Calculation gotchas
- Schema anomalies
Best Practices
- Provide rich context - The more domain context you provide, the better the teachings
- Review generated teachings - Spot-check output to ensure relevance and accuracy
- Combine with manual teachings - Auto-generated + manually curated = comprehensive
- Iterate - Regenerate teachings as your schema evolves
- Use retry parameter - Set higher retry count for production pipelines
Related Documentation
- Teach the System - Comprehensive guide to all teachable types
- Persona Generator - Generate user personas for diverse questions
- From Schema - Using teachings with SchemaSynthesizer
Comparison: Manual vs Generated
| Aspect | Manual Teaching | Generated Teaching |
|---|---|---|
| Precision | High - you define exact rules | Medium - inferred from schema |
| Coverage | Low - only what you write | Medium - covers common patterns |
| Effort | High - requires domain expertise | Low - automated |
| Best For | Business-critical rules, edge cases | Bootstrapping, domain vocabulary |
Recommendation: Use both. Generate teachings for baseline knowledge, then supplement with manual teachings for critical business rules and edge cases.