Text2SQL - Natural Language to SQL

Build a system that aggregates news from multiple RSS feeds and generates AI-powered daily digests. This recipe demonstrates multi-source ingestion with TTL-based refresh and structured output generation.

What You'll Build

Multi-feed RSS aggregation
Automatic refresh based on TTL
AI-generated daily digests with trending topics
Semantic search across all news sources

Prerequisites

npm install @deepagents/retrieval @deepagents/agent @ai-sdk/groq ai zod

Complete Implementation

import { groq } from '@ai-sdk/groq';
import z from 'zod';

import { agent, generate, instructions } from '@deepagents/agent';
import {
  fastembed,
  ingest,
  nodeSQLite,
  similaritySearch,
} from '@deepagents/retrieval';
import { rss } from '@deepagents/retrieval/connectors';

// Configure news feeds
const feeds = [
  { name: 'Hacker News', url: 'https://news.ycombinator.com/rss' },
  { name: 'TechCrunch', url: 'https://techcrunch.com/feed/' },
  { name: 'The Verge', url: 'https://www.theverge.com/rss/index.xml' },
  {
    name: 'Ars Technica',
    url: 'https://feeds.arstechnica.com/arstechnica/technology-lab',
  },
];

// Set up retrieval infrastructure
const store = nodeSQLite('./news.db', 384);
const embedder = fastembed();

// Index all feeds with TTL-based refresh
async function updateFeeds() {
  console.log('Updating news feeds...\n');

  for (const feed of feeds) {
    console.log(`Indexing ${feed.name}...`);

    await ingest({
      connector: {
        ...rss(feed.url, {
          maxItems: 30,
          fetchFullArticles: true,
        }),
        ingestWhen: 'expired',
        expiresAfter: 60 * 60 * 1000, // 1 hour TTL
      },
      store,
      embedder,
    });
  }

  console.log('\nAll feeds indexed!\n');
}

// Search across all feeds
async function searchNews(query: string, limit = 10) {
  const results = await similaritySearch(query, {
    connector: rss(feeds[0].url), // Any connector works for search
    store,
    embedder,
  });

  return results.slice(0, limit);
}

// Define the digest schema
const DigestSchema = z.object({
  summary: z.string().describe('2-3 sentence overview of top stories'),
  topStories: z
    .array(
      z.object({
        title: z.string(),
        source: z.string(),
        keyPoints: z.array(z.string()).describe('2-3 key points'),
      }),
    )
    .describe('Top 5 most important stories'),
  trends: z.array(z.string()).describe('3-5 emerging trends'),
  categories: z
    .record(z.array(z.string()))
    .describe('Stories grouped by category'),
});

// Create the digest agent
const digestAgent = agent({
  name: 'DigestAgent',
  model: groq('gpt-oss-20b'),
  output: DigestSchema,
  prompt: instructions({
    purpose: [
      'Create a concise, informative daily tech news digest.',
      'Identify the most important stories and emerging trends.',
      'Group stories by category for easy scanning.',
    ],
    routine: [
      'Analyze the provided news articles',
      'Identify the top 5 most significant stories',
      'Extract key points from each story',
      'Identify emerging trends across stories',
      'Categorize stories (AI, Startups, Security, etc.)',
    ],
  }),
});

// Generate the daily digest
async function generateDigest() {
  console.log('Generating daily digest...\n');

  // Search across multiple topics to get diverse content
  const topics = [
    'AI',
    'startups',
    'cybersecurity',
    'developer tools',
    'tech industry',
  ];
  const allResults: any[] = [];

  for (const topic of topics) {
    const results = await searchNews(topic, 5);
    allResults.push(...results);
  }

  // Deduplicate by document_id
  const unique = [
    ...new Map(allResults.map((r) => [r.document_id, r])).values(),
  ];
  const top = unique.sort((a, b) => b.similarity - a.similarity).slice(0, 20);

  // Format content for the agent
  const context = top
    .map(
      (r, i) => `
Article ${i + 1}:
${r.content}
---`,
    )
    .join('\n');

  // Generate structured digest
  const { output: digest } = await generate(
    digestAgent,
    `Create a daily tech news digest from these articles:\n\n${context}`,
    {},
  );

  return digest;
}

// Format digest for display
function formatDigest(digest: z.infer<typeof DigestSchema>) {
  let output = '';

  output += '# Daily Tech Digest\n\n';
  output += `## Overview\n${digest.summary}\n\n`;

  output += '## Top Stories\n';
  for (const story of digest.topStories) {
    output += `\n### ${story.title}\n`;
    output += `*Source: ${story.source}*\n`;
    for (const point of story.keyPoints) {
      output += `- ${point}\n`;
    }
  }

  output += '\n## Emerging Trends\n';
  for (const trend of digest.trends) {
    output += `- ${trend}\n`;
  }

  output += '\n## By Category\n';
  for (const [category, stories] of Object.entries(digest.categories)) {
    output += `\n### ${category}\n`;
    for (const story of stories) {
      output += `- ${story}\n`;
    }
  }

  return output;
}

// Run the aggregator
async function main() {
  // Update feeds (respects TTL)
  await updateFeeds();

  // Generate and display digest
  const digest = await generateDigest();
  console.log(formatDigest(digest));

  // Example: search for specific topic
  console.log('\n--- Search Results: "AI regulation" ---\n');
  const searchResults = await searchNews('AI regulation', 3);
  for (const result of searchResults) {
    console.log(
      `[${result.similarity.toFixed(2)}] ${result.content.slice(0, 200)}...\n`,
    );
  }
}

await main();

How It Works

1. Feed Configuration

Define your news sources with URLs:

const feeds = [
  { name: 'Hacker News', url: 'https://news.ycombinator.com/rss' },
  { name: 'TechCrunch', url: 'https://techcrunch.com/feed/' },
];

2. TTL-Based Refresh

Configure automatic refresh with expiresAfter:

await ingest({
  connector: {
    ...rss(feed.url, { fetchFullArticles: true }),
    ingestWhen: 'expired',
    expiresAfter: 60 * 60 * 1000, // Re-fetch after 1 hour
  },
  store,
  embedder,
});

3. Structured Output

Use Zod schemas for consistent digest format:

const DigestSchema = z.object({
  summary: z.string(),
  topStories: z.array(z.object({...})),
  trends: z.array(z.string()),
});

Customization Options

Different Refresh Intervals

Set different TTLs for different feeds:

const feedConfigs = [
  { url: 'https://breaking-news.com/rss', ttl: 15 * 60 * 1000 }, // 15 min
  { url: 'https://weekly-digest.com/rss', ttl: 24 * 60 * 60 * 1000 }, // 24 hours
];

Category-Specific Feeds

Organize feeds by category:

const feedsByCategory = {
  tech: ['https://techcrunch.com/feed/'],
  security: ['https://krebsonsecurity.com/feed/'],
  ai: ['https://openai.com/blog/rss.xml'],
};

for (const [category, urls] of Object.entries(feedsByCategory)) {
  for (const url of urls) {
    await ingest({
      connector: {
        ...rss(url),
        ingestWhen: 'expired',
        expiresAfter: 60 * 60 * 1000,
      },
      store,
      embedder,
    });
  }
}

Scheduled Updates

Run on a schedule using cron or node-schedule:

import { schedule } from 'node-schedule';

// Update feeds every hour
schedule.scheduleJob('0 * * * *', async () => {
  await updateFeeds();
  const digest = await generateDigest();
  await sendDigestEmail(digest);
});

Alert System

Get notified when specific topics appear:

async function checkAlerts(topics: string[]) {
  for (const topic of topics) {
    const results = await searchNews(topic, 3);

    if (results.length > 0 && results[0].similarity > 0.8) {
      console.log(`Alert: High-relevance content for "${topic}"`);
      // Send notification
    }
  }
}

await checkAlerts(['security breach', 'data leak', 'major acquisition']);

Production Tips

Respect rate limits: Add delays between feed fetches
Handle failures gracefully: Skip failed feeds, don't crash the pipeline
Store raw content: Keep original articles for reference
Deduplicate: Same story often appears in multiple feeds
Time filtering: Focus on recent articles for daily digests

Next Steps

Research Assistant - Academic content
RSS Connector - Connector details
Ingestion Modes - TTL configuration

News Aggregator