Pooled-Evidence Factuality Checks for MCP Agents (Cross-Source Conflation)¶

When an MCP agent draws on multiple sources, a pooled-evidence factuality verifier passes claims supported somewhere but attributed to the wrong source.

When This Matters Most¶

The failure mode only appears under three concurrent conditions (Alvarez et al., 2026 — arxiv:2606.18037):

Multi-source MCP traces. The agent routes a single answer through two or more tools or sources — search plus an API, a database plus a formulary, two clinical guidelines. A single-source agent has nothing to conflate.
Stable tool and source IDs in the trace. Captured MCP traces expose tool IDs, source IDs, and raw outputs. Free-text tool returns that mash several URLs into one snippet cannot be routed deterministically and the technique does not apply.
High-stakes domain. The original evaluation is medical (arxiv:2606.18037); analogous risk lives in clinical decision support, legal research, regulated finance — anywhere a wrong attribution is itself the safety failure, not just a citation polish issue.

For low-stakes, single-source agents, the cost of per-source claim routing is not worth paying.

The Pattern¶

Most factuality verifiers — including lightweight NLI-based RAG checkers in production today — ask one question: is this claim supported anywhere in the pooled evidence? (Sansford et al., 2024 — arxiv:2411.01022) An MCP agent that emits a citation is making two claims, not one: the factual claim and "source X supports this claim." Pooled-evidence verifiers conflate the two checks. A claim with a wrong source ID but accurate content passes.

Alvarez et al. name this cross-source conflation: a claim "may be supported somewhere while being attributed to the wrong source" (arxiv:2606.18037). On 50 controlled clinical conflation probes against source-blind baselines, every injected attribution swap was retained — the verifier could not distinguish the swap from a correct answer.

Why It Fails¶

Source attribution is "an independent axis for factuality verification" (arxiv:2606.18037). Two distinct failures live on the axis a pooled-evidence verifier cannot see:

Failure	What pooled NLI sees	What the agent did
Unsupported claim	Fails	Fabricated content with no source backing
Cross-source conflation	Passes	Real content; cited the wrong source

A serverName-style allowlist of sources buys nothing here — the source IDs in the trace are correct; the mapping from claim to source is wrong. Independent corroboration: across 14 LLMs, inline citations from deep-research agents fail link-accessibility, topical-relevance, and factual-accuracy checks at high rates (Onweller et al., 2026 — arxiv:2605.06635), and citation accuracy in popular generative search engines sits near 74% (VeriCite, arxiv:2510.11394).

Why Source-Aware Verification Works¶

The corrected approach routes each atomic claim to its declared source's evidence — not the pooled set — and runs NLI against that source alone. The stated attribution must match the routed source, or the claim is blocked regardless of what other sources would have supported. Per-source routing decouples support (does this source contain evidence for the claim?) from attribution (is the cited source the one that contains the evidence?) — two distinct failures, two distinct checks. On a 40-trace held-out split of medical MCP-agent traces, this reaches block F1 0.802 and source accuracy 0.858 over 260 source-eligible claims, outperforming source-blind baselines (Alvarez et al., 2026 — arxiv:2606.18037).

graph TD
    A["Agent answer + cited sources"] --> B["Decompose into atomic claims"]
    B --> C{"Per-source routing"}
    C -->|"Source A"| D["NLI against Source A only"]
    C -->|"Source B"| E["NLI against Source B only"]
    D --> F["Cited source matches routed source?"]
    E --> F
    F -->|"Match + supported"| G["Allow"]
    F -->|"Mismatch or unsupported"| H["Block"]
    style G fill:#1a7f37,color:#fff
    style H fill:#b60205,color:#fff

When This Backfires¶

Source-aware verification is overhead-justified only inside the conditions named above. Outside them the trade-offs flip:

Semantically close sources defeat exact ownership. On a harder multi-source benchmark, source-plus-relation accuracy drops to 0.229 (arxiv:2606.18037) — two near-overlapping oncology guidelines look interchangeable to NLI, inheriting NLI's threshold sensitivity.
Repair-and-reverify can mask the upstream problem. Repair "resolves all blocked answers, often via conservative fallback" (arxiv:2606.18037). A verifier that always blocks-then-fallbacks reduces answer rate without fixing why the agent keeps misattributing.
Free-text tool returns break routing. Web-search snippets that combine multiple URLs into one block have no stable source ID to route to; the technique reduces to standard pooled NLI.
Single-source agents waste the overhead. No conflation surface means per-source NLI buys nothing over a pooled fact-checker.

Example¶

Before — pooled-evidence NLI passes a cross-source conflation:

Agent answer:
  "The recommended starting dose is 10 mg daily [formulary_tool]."

Pooled evidence:
  - clinical_record_tool: patient on 10 mg daily
  - formulary_tool: starting dose 5 mg, titrate to 10 mg

Pooled NLI verdict: SUPPORTED  ← passes; 10 mg appears somewhere

The claim content is true (10 mg shows up in pooled evidence) but the cited source is wrong (formulary says start at 5 mg). A source-blind verifier cannot see the swap.

After — source-aware verifier routes per claim:

Claim: "starting dose is 10 mg daily"
Cited source: formulary_tool

Route NLI to formulary_tool only:
  formulary_tool says: "starting dose 5 mg, titrate to 10 mg"
  NLI verdict: NOT SUPPORTED for "starting dose is 10 mg"

Attribution check: formulary_tool ≠ source that supports the claim
Verdict: BLOCK

The agent's answer is then revised via retrieval-augmented repair — re-route to clinical_record_tool for the patient's current dose, or correct the formulary quote — and re-verified before release (arxiv:2606.18037).

Key Takeaways¶

Pooled-evidence factuality verifiers cannot detect cross-source conflation — they ask "supported anywhere?" not "supported by the cited source?"
The failure matters for multi-source MCP agents in high-stakes domains where wrong attribution is itself the safety failure.
Source-aware verification routes each atomic claim to its declared source's evidence, then checks both support and attribution; on medical MCP traces this reaches block F1 0.802 and detects all 50 injected attribution swaps in controlled probes.
The technique partially fails on semantically close sources (source-plus-relation accuracy drops to 0.229) and inherits NLI threshold sensitivity — it is not a complete solution.