Deep Agents
AgentContextOrchestratorRetrievalText2SQLToolbox

Repo Connector

Index repository source code with language-aware filtering

The repo connector indexes source code from local repositories with intelligent filtering for common non-essential files. It's optimized for code search and understanding.

Import

import { repo } from '@deepagents/retrieval/connectors';

Basic Usage

import { similaritySearch, fastembed, nodeSQLite } from '@deepagents/retrieval';
import { repo } from '@deepagents/retrieval/connectors';

const store = nodeSQLite('./code.db', 384);

const results = await similaritySearch('authentication middleware', {
  connector: repo('./src', ['.ts', '.tsx'], 'contentChanged'),
  store,
  embedder: fastembed(),
});

Parameters

repo(
  dir: string,                                    // Repository directory
  extensions: string[],                           // File extensions to include
  ingestWhen: 'never' | 'contentChanged' | 'expired'  // Ingestion mode
)

Parameters

ParameterDescription
dirPath to the repository root
extensionsArray of file extensions (with or without dots)
ingestWhenControls re-indexing behavior

File Filtering

The connector automatically excludes:

Common Directories

  • node_modules/, .pnpm/, .npm/, .yarn/, vendor/
  • .git/, .svn/, .hg/
  • dist/, build/, out/, target/, bin/, obj/
  • .next/, .vercel/, .turbo/, .vite/
  • coverage/, .nyc_output/, jest-cache/, .pytest_cache/
  • .venv/, venv/
  • .idea/, .vscode/, .fleet/

Files

  • .env, .env.*
  • Lock files (*.lock, package-lock.json, yarn.lock, pnpm-lock.yaml)

Size Limit

Files larger than 3KB are skipped to focus on meaningful code units.

Gitignore Support

The connector respects .gitignore patterns from the repository root.

Real-World Examples

Index and search a TypeScript project:

import { similaritySearch, fastembed, nodeSQLite } from '@deepagents/retrieval';
import { repo } from '@deepagents/retrieval/connectors';

async function searchCode(query: string) {
  const store = nodeSQLite('./ts-code.db', 384);

  const results = await similaritySearch(query, {
    connector: repo('./src', ['.ts', '.tsx'], 'contentChanged'),
    store,
    embedder: fastembed(),
  });

  return results.map(r => ({
    file: r.document_id,
    content: r.content.slice(0, 200),
    similarity: r.similarity,
  }));
}

// Find authentication-related code
const results = await searchCode('JWT token validation middleware');

Multi-Language Project

Index a project with multiple languages:

import { ingest, similaritySearch, fastembed, nodeSQLite } from '@deepagents/retrieval';
import { repo } from '@deepagents/retrieval/connectors';

const store = nodeSQLite('./fullstack.db', 384);
const embedder = fastembed();

// Index frontend (TypeScript/React)
await ingest({
  connector: repo('./frontend/src', ['.ts', '.tsx', '.css'], 'contentChanged'),
  store,
  embedder,
});

// Index backend (Python)
await ingest({
  connector: repo('./backend', ['.py'], 'contentChanged'),
  store,
  embedder,
});

// Index infrastructure (Go)
await ingest({
  connector: repo('./services', ['.go'], 'contentChanged'),
  store,
  embedder,
});

// Search across all code
async function searchFullstack(query: string) {
  const results = await similaritySearch(query, {
    connector: repo('./frontend/src', ['.ts'], 'contentChanged'),
    store,
    embedder,
  });

  return results;
}

const results = await searchFullstack('error handling retry logic');

Index multiple packages in a monorepo:

import { ingest, similaritySearch, fastembed, nodeSQLite } from '@deepagents/retrieval';
import { repo } from '@deepagents/retrieval/connectors';

const store = nodeSQLite('./monorepo.db', 384);
const embedder = fastembed();

// Index each package
const packages = ['api', 'web', 'shared', 'cli'];

for (const pkg of packages) {
  await ingest({
    connector: repo(`./packages/${pkg}/src`, ['.ts'], 'contentChanged'),
    store,
    embedder,
  });
  console.log(`Indexed: ${pkg}`);
}

// Search across all packages
async function searchMonorepo(query: string) {
  const results = await similaritySearch(query, {
    connector: repo('./packages/api/src', ['.ts'], 'contentChanged'),
    store,
    embedder,
  });

  return results;
}

const results = await searchMonorepo('database connection pool');

Code Review Context

Build context for code review:

import { similaritySearch, fastembed, nodeSQLite } from '@deepagents/retrieval';
import { repo } from '@deepagents/retrieval/connectors';

async function getRelatedCode(changedFile: string, content: string) {
  const store = nodeSQLite('./review.db', 384);

  // Search for related code based on the changes
  const results = await similaritySearch(content, {
    connector: repo('./src', ['.ts', '.tsx'], 'contentChanged'),
    store,
    embedder: fastembed(),
  });

  // Filter out the file being reviewed
  return results
    .filter(r => r.document_id !== changedFile)
    .slice(0, 5);
}

// Find code related to a PR change
const relatedCode = await getRelatedCode(
  'src/auth/login.ts',
  'async function validateCredentials(email: string, password: string)'
);

Finding All Git Repos

The connector exports a utility to find all Git repositories:

import { findAllGitRepos } from '@deepagents/retrieval/connectors';

// Find all git repos under home directory
for await (const repoPath of findAllGitRepos('/Users/dev')) {
  console.log(`Found repo: ${repoPath}`);
}

This skips common non-project directories like node_modules, Library, Downloads, etc.

Source ID

The connector generates a source ID based on the directory:

repo('./src', ['.ts'], 'contentChanged')
// sourceId: "repo:./src"

Metadata

Each indexed file includes repository metadata:

{
  repo: './src', // The directory parameter
}

Helper Functions

collectFiles

Get all files matching extensions in a directory:

import { collectFiles } from '@deepagents/retrieval/connectors';

const files = await collectFiles('./src', ['.ts', '.tsx']);

for await (const file of files) {
  console.log(file);
}

ignorePatterns

Get the full list of ignore patterns:

import { ignorePatterns } from '@deepagents/retrieval/connectors';

const patterns = await ignorePatterns('./my-project');
console.log(patterns);
// ['node_modules/**', '.git/**', 'dist/**', ...]

Next Steps