Retrieval
A RAG (Retrieval-Augmented Generation) system for semantic search over any data source
Retrieval
@deepagents/retrieval is a complete RAG (Retrieval-Augmented Generation) system that enables semantic search over various data sources. Ingest content from local files, GitHub repositories, RSS feeds, PDFs, and more—then query with natural language.
Installation
npm install @deepagents/retrievalHow It Works
The retrieval system follows a simple pipeline:
Data Sources → Connectors → Chunking → Embeddings → Vector Store → Search- Connectors pull content from data sources (files, GitHub, RSS, PDFs)
- Splitters break content into chunks for embedding
- Embedders convert text into vector representations
- Stores persist embeddings and enable similarity search
- Search finds relevant chunks based on semantic similarity
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ @deepagents/retrieval │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Connectors │ ──▶ │ Ingestion │ ──▶ │ Store │ │
│ │ │ │ Pipeline │ │ │ │
│ │ • local │ │ │ │ • SQLite │ │
│ │ • github │ │ ┌────────┐ │ │ • vec0 │ │
│ │ • rss │ │ │Splitter│ │ │ │ │
│ │ • pdf │ │ └───┬────┘ │ └──────┬──────┘ │
│ │ • linear │ │ │ │ │ │
│ │ • repo │ │ ┌───▼────┐ │ ┌──────▼──────┐ │
│ └─────────────┘ │ │Embedder│ │ │ Search │ │
│ │ └────────┘ │ │ │ │
│ └──────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘Quick Example
import { similaritySearch, fastembed, nodeSQLite } from '@deepagents/retrieval';
import { local } from '@deepagents/retrieval/connectors';
// Create a vector store (384 dimensions for BGE-small model)
const store = nodeSQLite('./knowledge.db', 384);
// Search your markdown documentation
const results = await similaritySearch('How do I authenticate users?', {
connector: local('docs/**/*.md'),
store,
embedder: fastembed(),
});
// Results include content, similarity score, and metadata
for (const result of results) {
console.log(`[${result.similarity.toFixed(2)}] ${result.content.slice(0, 100)}...`);
}Key Concepts
Connectors
Connectors define where your data comes from. Each connector implements a simple interface that yields documents:
type Connector = {
sourceId: string; // Unique identifier for caching
sources: () => AsyncGenerator<...>; // Yields documents
ingestWhen?: 'never' | 'contentChanged' | 'expired';
expiresAfter?: number; // TTL in milliseconds
};Available connectors:
- local - Local filesystem with glob patterns
- github - GitHub files, releases, and repositories
- rss - RSS feeds with article extraction
- pdf - PDF text extraction
- linear - Linear workspace issues
- repo - Repository code with language filtering
Embedders
Embedders convert text into vector representations. The package includes FastEmbed with multiple model options:
import { fastembed } from '@deepagents/retrieval';
// Default: BGE-small-en-v1.5 (384 dimensions)
const embedder = fastembed();
// Or specify a different model
const embedder = fastembed({
model: EmbeddingModel.AllMiniLML6V2,
batchSize: 32,
});Stores
Stores persist embeddings and provide similarity search. The SQLite store uses the vec0 extension for efficient vector operations:
import { nodeSQLite } from '@deepagents/retrieval';
// Create store with dimension matching your embedding model
const store = nodeSQLite('./vectors.db', 384);Ingestion Modes
Control when content gets re-indexed:
contentChanged(default) - Re-ingest when content hash changesnever- Ingest once, never updateexpired- Re-ingest when TTL expires
const connector = local('**/*.md', {
ingestWhen: 'expired',
expiresAfter: 24 * 60 * 60 * 1000, // 24 hours
});Exports
Main Package
import {
// Core functions
similaritySearch,
ingest,
// Embedders
fastembed,
// Stores
nodeSQLite,
SQLiteStore,
// Utilities
cid,
// Types
type Store,
type Embedder,
type Chunk,
type Corpus,
} from '@deepagents/retrieval';Connectors Subpath
import {
local,
github,
rss,
pdf,
pdfFile,
linear,
repo,
} from '@deepagents/retrieval/connectors';Next Steps
- Getting Started - Build your first RAG pipeline
- Local Connector - Index local files
- Embedders - Configure embedding models
- Recipes - Real-world examples