Structured Domain Retrieval: Knowledge Graphs and Case-Based Reasoning¶
A knowledge graph of package-function hierarchies plus coverage-driven case selection retrieves domain context that flat similarity search misses.
The problem with flat retrieval¶
Standard RAG retrieves context by embedding similarity — it vectorizes the query and returns the closest chunks. This fails for domain-specific code generation because API knowledge is hierarchical. A function belongs to a module, which belongs to a package, with specific parameter types and return conventions. Embedding distance does not encode these relationships. Graph-structured retrieval captures relational context better in knowledge-intensive tasks (Edge et al., "From Local to Global", 2024).
DomAgent demonstrated this: a 7B model with flat retrieval scored ~40% pass@1 on truck software tasks; with structured KG retrieval plus case-based reasoning it scored 96.6% (DomAgent, 2025).
Two retrieval paths¶
Structured domain retrieval works through two complementary paths: understanding what exists (top-down) and seeing how it is used (bottom-up).
graph TD
Q[Task Query] --> KG[Knowledge Graph Path]
Q --> CB[Case-Based Path]
KG --> PC[Package Classification]
PC --> FS[Function Similarity Ranking]
FS --> R[Refinement]
CB --> CL[Cluster Lookup]
CL --> CS[Coverage-Driven Selection]
CS --> R
R --> CTX[Composed Context]
CTX --> LLM[Code Generation]
Top-down: knowledge graph retrieval¶
Build a knowledge graph from your domain's API surface — packages, modules, classes, and functions as entities with containment and dependency edges. At retrieval time:
- Package classification: an LLM decides which packages are relevant to the current task.
- Function ranking: cosine similarity between task and function embeddings within the selected packages.
- Top-T selection: the highest-ranked functions and their documentation are returned.
The agent receives package location, parameter types, and sibling relationships, not just an isolated signature.
Bottom-up: case-based reasoning¶
Working code examples show how API functions are actually used. The idea is coverage-driven selection: cluster the functions, then select a minimal representative set.
- Cluster functions by semantic similarity within each package, using K-means.
- Select cases one at a time, adding a case if it covers a new package or cluster.
- Stop at coverage thresholds, typically 90% of packages and 90% of clusters.
DomAgent found that 30% of coverage-selected cases matched the performance of 80% randomly selected cases on the benchmarks tested (DS-1000 and a truck CAN signal domain); generalizability to other domains has not been established (DomAgent, 2025).
Refinement gate¶
The LLM reviews the retrieved items against the task and removes entries that look similar but are not functionally relevant. This is the structured equivalent of observation masking.
When to use this¶
Structured domain retrieval pays off when you have:
- a well-defined API surface: SDKs, internal libraries, or frameworks with package-function hierarchies
- a large API surface: hundreds of functions across dozens of packages
- repetitive tasks: the same API patterns recur, so case curation is worthwhile
- high accuracy requirements: regulated or safety-critical domains where 40% pass@1 is unacceptable
Skip it when the API fits in a system prompt, when tasks are exploratory, or when the team cannot maintain the knowledge graph (see When this backfires below).
Construction¶
For the knowledge graph: parse the API docs or source for packages, classes, functions, params, and return types. Build containment and dependency edges. Embed each function from its name, description, and signature. Store it in a graph DB, JSON index, or MCP server.
For the case base: collect working examples from tests, docs, or production. Embed them, cluster them by similarity, and select via coverage thresholds (90% package, 90% cluster). Store each case with metadata linking it to the KG entities it exercises.
Then expose both paths as tools, following the retrieval-augmented agent workflow pattern:
# Agent tool descriptions (startup context)
- search_domain_kg: Query the domain knowledge graph for relevant functions
- search_case_base: Retrieve representative code examples for a task
The agent starts lean — only tool descriptions preloaded — then calls search_domain_kg and search_case_base on demand and generates code grounded in both.
Example¶
A vehicle diagnostics agent generates code against a CAN signal SDK with 400+ functions across 30 packages.
Knowledge graph entry (stored in a JSON index or graph DB):
{
"package": "can_signals.body",
"module": "lighting",
"function": "set_headlight_mode",
"params": [{"name": "mode", "type": "HeadlightMode"}, {"name": "bus_id", "type": "int"}],
"returns": "SignalResult",
"depends_on": ["can_signals.core.send_frame"]
}
Case base entry (coverage-selected working example):
# Case: Toggle hazard lights via CAN bus
from can_signals.body.lighting import set_indicator_mode, IndicatorMode
from can_signals.core import send_frame
result = set_indicator_mode(IndicatorMode.HAZARD, bus_id=0)
send_frame(result.frame, timeout_ms=100)
Agent tool call sequence:
- The task arrives: "Write a function to activate high-beam headlights on bus 1".
- The agent calls
search_domain_kg("headlight high beam"), which returnsset_headlight_modewith its package path, params, and dependency onsend_frame. - The agent calls
search_case_base("headlight"), which returns the hazard light case showing theset_*tosend_framepattern. - The refinement gate keeps both for direct relevance, and would discard an unrelated
body.doorsresult. - The agent generates code grounded in the KG signature and the case pattern.
Key Takeaways¶
- Knowledge graphs preserve package-function structure that vector similarity loses.
- Coverage-driven case selection produces a minimal set that outperforms larger random collections.
- A refinement gate removes superficially similar but irrelevant context before generation.
- Expose KG and case base as on-demand tools rather than preloading into the context window.
When this backfires¶
Structured domain retrieval adds significant upfront cost and ongoing maintenance. Assess three failure conditions before committing:
- API churn outpaces graph updates: when the API surface changes faster than the KG and case base can be refreshed, the agent retrieves stale signatures and outdated examples. Fast-moving internal SDKs or pre-release frameworks are high-risk.
- KG construction pays off only above about 100 functions: parsing, embedding, and indexing a small API surface costs more engineering time than curated few-shot examples in the system prompt. Measure actual retrieval failures before building graph infrastructure.
- Case base diversity is too thin: coverage-driven selection depends on enough working examples to form meaningful clusters. Projects with thin test suites or sparse documentation produce a case base that mimics the gaps of flat retrieval.
Graph retrieval is not universally better than flat vector search, even once built. GraphRAG-Bench finds graph-structured retrieval often underperforms vanilla RAG, with benefits showing up only under specific conditions (Xiang et al., "When to use Graphs in RAG", 2025). Treat the DomAgent gains as evidence for well-defined, hierarchical API domains, not a blanket win, and measure against a vector-RAG baseline before committing.
Related¶
- Retrieval-Augmented Agent Workflows — simpler baseline this page extends
- Schema-Guided Graph Retrieval — typed graph retrieval using a shared domain schema across construction, decomposition, and retrieval
- Repository Map Pattern — AST + graph importance for code context
- Semantic Context Loading — LSP-based structured code navigation
- Context Hub — on-demand API docs without hierarchical structure
- Domain-Specific Agent Challenges — human factors of domain-specific agents
- Repository-Level Retrieval for Code Generation — cross-file dependency and AST retrieval for code generation
- Observation Masking — refinement gate for intermediate tool results