Skip to content

Data Fidelity Guardrails

Ensure agents faithfully relay data from APIs, MCP servers, and databases rather than silently summarizing, altering, or fabricating values.

The Data Relay Problem

Agents sit between users and live data sources -- APIs, MCP servers, and databases. The failure mode is not hallucination from nothing -- it is mutation of real data. The model receives correct data from a tool and presents an altered version: financial figures get rounded, query results get summarized, status fields get paraphrased. The user cannot distinguish faithful relay from subtle fabrication.

CyberArk's ATPA research demonstrates that malicious tool outputs can instruct the model to alter data deliberately -- tool poisoning extends beyond descriptions into return values.

Architecture Patterns

Passthrough Architecture

Route raw tool responses to the UI alongside the model's summary:

graph LR
    A[Tool / API] -->|raw response| B[Application Layer]
    B -->|raw data| C[UI: Data Panel]
    B -->|raw data| D[LLM]
    D -->|summary| E[UI: Agent Commentary]

The user sees both the raw data and the agent's interpretation; discrepancies are immediately visible. The raw panel is populated by deterministic code, never by the LLM. The trade-off is UI complexity -- not every interface can display raw data alongside commentary.

Structural Separation of Data and Commentary

Separate factual fields (populated by deterministic code) from the LLM's generated commentary:

{
  "data": {
    "account_balance": 14523.87,
    "last_transaction": "2025-03-12T09:41:00Z",
    "status": "active"
  },
  "commentary": "Account is active with a recent transaction yesterday."
}

The data object is copied directly from the API response by application code; commentary is the only LLM-generated field. Downstream consumers know which fields to trust unconditionally.

Typed Schema Validation

Structured outputs enforce schema shape through constrained decoding — see Typed Schemas at Agent Boundaries for the full pattern of applying typed contracts at every agent-to-agent interface. But schema compliance does not equal value accuracy -- a correctly-typed "balance": 14500.00 still differs from the true value of 14523.87.

Layer schema validation with other defenses:

Layer What it catches What it misses
Schema validation Wrong types, missing fields, invalid enums Fabricated values within valid types
Passthrough display Value mutations visible to users Nothing -- but requires human attention
Diff-based auditing Any discrepancy, automatically Mutations the model applies before logging
Checksum verification Any payload alteration Requires infrastructure support

Diff-Based Auditing

Log raw tool responses and the model's presented version; flag discrepancies automatically:

graph TD
    A[Tool returns response] --> B[Log raw response]
    A --> C[LLM processes response]
    C --> D[Log presented version]
    B --> E[Diff engine]
    D --> E
    E -->|match| F[Pass]
    E -->|mismatch| G[Alert / block]

Observability platforms like LangSmith and Langfuse log tool inputs and outputs, enabling this comparison in production. The key constraint: logging must happen before the LLM sees the data.

Tool Output Integrity

Tool Poisoning as a Data Fidelity Threat

Tool poisoning attacks embed hidden instructions in tool return values, not just descriptions. A compromised MCP server can include directives telling the model to alter, exfiltrate, or suppress data. Invariant Labs documented cross-server data exfiltration via this vector.

Mitigations:

  • Version pinning with checksums -- detect unauthorized tool modifications (ETDI proposes cryptographic signing of tool definitions)
  • Cross-server dataflow boundaries -- prevent data from one MCP server reaching tools on another
  • Spotlighting / datamarking -- Microsoft's MCP security guidance recommends marking boundaries between trusted instructions and untrusted tool content
  • Dual LLM separation -- the Dual LLM pattern routes untrusted data through a quarantined model with no tool access

Design Tools for Fidelity

Anthropic's tool output guidance recommends:

  • Return only relevant fields -- every extra field is a mutation opportunity
  • Use semantic values instead of opaque identifiers -- the model is less likely to fabricate a name than a UUID
  • Paginate at the tool layer -- unbounded result sets force the model to compress output, introducing mutation risk

See Semantic Tool Output for the full pattern.

Anti-Pattern

Trusting the model to faithfully transcribe data because the prompt says "report exact values." Prompt instructions are probabilistic; a passthrough panel or diff-based audit is deterministic. Use both -- prompt for guidance, architecture for enforcement.

When This Backfires

These guardrails impose real costs. Skip the full stack when:

  • The surface cannot show structured data -- voice, SMS, and narrow chat surfaces have no room for a raw panel; passthrough becomes noise users ignore.
  • Stakes are low and reads are casual -- status lookups and document summaries have small mutation blast radius; the engineering cost outweighs the protection.
  • Data is high-cardinality or streaming -- large result sets make raw panels unreadable and diff engines a latency bottleneck.
  • Token or latency budgets are tight -- logging raw responses and returning both raw fields and commentary inflates context and response time.

Under these conditions, prefer typed schemas at the boundary and spot-check evals on exact values instead of the full passthrough-plus-diff stack.

Key Takeaways

  • Data relay failures are value mutations, not hallucinations -- the model has the right data and presents it wrong
  • Passthrough architecture and structural separation are the strongest defenses; schema validation enforces shape, not accuracy
  • Diff-based auditing catches discrepancies automatically; tool poisoning makes this a security concern, not just a reliability one
  • Design tools to minimize mutation opportunity: fewer fields, semantic values, tool-layer filtering
Feedback