Skip to content

Codebase-Derived Pattern Libraries as Agent Context

A library of proven implementations mined from your own repositories, indexed by intent and served to an agent as retrievable context rather than generic examples.

The Idea

A codebase-derived pattern library extracts reusable implementations already proven in your own repositories, indexes them, and lets an agent retrieve them by intent during a task. Without one, an agent writing code in your repository defaults to general training-data patterns rather than your team's reviewed ones. Instead of "how does anyone paginate an API," the agent asks "how do we paginate an API" and gets your team's actual, reviewed implementation.

This is a retrieval problem with a sharper corpus. Where logical retrieval over an inverted index and RAG component prioritization for software engineering tune how relevant context is selected, a pattern library tunes what is in the corpus — narrowing it to vetted, in-house code rather than the open web.

Why a Private Corpus Wins

  • Higher signal. Your merged code already encodes your conventions, error handling, and domain constraints. A retrieved in-house example needs less correction than a generic one synthesized from training data.
  • Consistency. Reusing an existing implementation keeps new code aligned with established patterns instead of introducing a third way to do the same thing.
  • Privacy. A library built and stored locally keeps proprietary code out of third-party retrieval services.

How the Library Gets Built

Turning a repository into a searchable pattern library is an extraction pipeline, not a manual catalogue. Pattern Vault is one concrete implementation: it parses source with tree-sitter to walk the AST, uses an LLM to classify and label the extracted snippets, and stores them in a local SQLite database with full-text search. The AST step bounds extraction to real syntactic units (functions, classes, blocks) rather than arbitrary text spans; the LLM step attaches the intent labels that make later intent-based search possible.

Serving Patterns to the Agent

A library only changes agent behavior if the agent can reach it mid-task. The Model Context Protocol is the natural transport: expose the library as an MCP server, and an agent in Claude Code or Cursor queries it by intent the same way it calls any other tool. Pattern Vault ships an MCP server for exactly this, alongside a CLI and a web dashboard for browsing the index directly.

Example

A developer points the tool at a service repository. The extraction pipeline walks the AST, classifies each unit, and builds a local index of labeled patterns — retry wrappers, pagination helpers, auth middleware, and so on. Mid-task, the agent issues an intent query rather than a keyword grep:

Find how this codebase handles paginated API responses

The MCP server returns the team's actual pagination helper and its call sites. The agent adapts that implementation instead of inventing a new one, so the new code matches existing conventions on the first pass.

Trade-offs

  • Staleness. The index reflects the codebase at extraction time. A library that is not re-built drifts from the current code and can surface deprecated patterns.
  • Pattern lock-in. Retrieving an existing implementation propagates whatever is already there — including suboptimal patterns. The library amplifies the codebase's habits, good and bad.
  • Maintenance cost. Extraction, classification, and re-indexing are recurring work; the library earns its keep only when reuse is frequent enough to offset that cost.

Key Takeaways

  • A codebase-derived pattern library narrows the retrieval corpus to vetted in-house code, raising signal over generic public examples
  • The build pipeline is AST parsing for structure plus LLM classification for intent labels
  • MCP is the transport that puts the library in front of an agent mid-task
  • The main risks are index staleness and amplifying suboptimal existing patterns
  • Pattern Vault implements this end-to-end: AST extraction, LLM classification, local SQLite search, and an MCP server for Claude Code and Cursor
Feedback