Skip to content

Trust Without Verify

Accepting agent output as correct because it looks polished — without independent verification.

The Pattern

Agent output is seductively plausible. Well-formatted prose with inline citations looks authoritative. Code that compiles looks correct. None of these surface signals correlate reliably with correctness.

The mistake is using output quality as a proxy for accuracy. An agent can produce hallucinated URLs, fabricated statistics, and plausible-but-wrong claims — all in grammatically perfect prose.

Symptoms

  • Citing a source without fetching it to confirm the claim
  • Running code that "looks right" without executing tests
  • Accepting a technical explanation because it sounds authoritative
  • Skipping diff review because the agent's summary is clear

Why It Happens

Fluency, formatting, and confidence are mistaken for correctness. Agents are trained to produce coherent responses — a separate objective from accuracy. Users systematically overestimate LLM accuracy when shown default explanations, and longer explanations further inflate user confidence without improving answer accuracy (Steyvers et al., 2025, Nature Machine Intelligence). A follow-up study found the same persuasion paradox in human-AI teams: fluent explanations raised confidence without improving accuracy beyond the AI prediction alone (Cohen et al., 2026).

Agents are most dangerous when almost right. A fully wrong answer is easy to catch; a mostly correct answer with one subtle error propagates undetected.

Fix

Verify independently — not by re-reading the output, but by checking against external ground truth:

  • Fetch cited URLs. Confirm the source exists and says what the agent claims.
  • Run the code. Compile, execute tests, check edge cases. "Compiles" and "correct" are different properties.
  • Cross-reference claims. Look up assertions in official documentation, not in the agent's summary of it — unchecked, a hallucinated summary becomes a trusted premise (Context Poisoning).
  • Review the diff. Diffs are easier to verify than full artifacts.

If something can be checked programmatically, check it automatically. Linters, type checkers, and test suites are verification, not overhead.

When This Backfires

Constant verification has a cost. Over-verifying introduces its own failure modes:

  • Verification theater: Running tests that don't cover the actual change, then treating a passing test suite as ground truth. The motion of verification without the substance.
  • Alert fatigue: Automated checks that fire too often train reviewers to dismiss failures — the cry-wolf failure mode. When every warning is noise, real errors get approved.
  • Bottleneck on low-stakes output: Applying the same scrutiny to a one-off throwaway script as to production auth code destroys the productivity benefit of AI assistance. Reserve manual verification for high-stakes, irreversible, or security-critical outputs; automate the rest with incremental verification.

The fix is calibrated verification, not universal paranoia.

Progressive Trust

Verify all output from new configurations. Spot-check proven ones. Always verify high-stakes outputs (security, production config, financial data) regardless of track record.

Example

Before — merged without testing

A developer asks the agent to add email validation. The agent produces a clean function with a comment citing "RFC 5322 compliance":

def is_valid_email(email: str) -> bool:
    """Validate email address (RFC 5322 compliant)."""
    pattern = r"^[a-zA-Z0-9._%\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email))

The developer merges it. The regex silently rejects valid + tagged addresses like user+tag@example.com — a subtle bug hidden behind authoritative presentation.

After — tested before merging

The developer runs the function against edge-case emails before merging:

assert is_valid_email("user+tag@example.com")  # FAILS

The test catches the missing + in the character class. The developer asks the agent to fix it, and the corrected regex is verified before merge.

Key Takeaways

  • "Looks right" is not a verification method
  • Output fluency is independent of correctness; agents are most dangerous when almost right
  • Automate verification; make it the default, not the exception
Feedback