Three Knowledge Tiers: Sourced, Unverified, Hallucinated¶

Classify agent knowledge into three tiers — sourced, unverified, and hallucinated — to preserve useful training knowledge while maintaining accuracy standards.

The Problem with Binary Accuracy Rules¶

Most anti-hallucination guidelines operate on a binary: a claim either has a citation or it is rejected. This conflates two different categories of unsourced knowledge:

Knowledge from training that is likely accurate but cannot be traced to a specific URL
Knowledge the model fabricated — plausible-sounding but incorrect

Treating both as "hallucination" and discarding them loses real signal. Treating both as acceptable loses accuracy.

The Three Tiers¶

Tier 1 — Sourced: The claim links to a primary source — documentation, a repository, a published blog post. Include as fact.

Tier 2 — Unverified: The agent has this knowledge from training and believes it is correct but cannot produce a source URL. Mark inline with [unverified] and collect in a dedicated section at the end of the document.

Tier 3 — Hallucinated: The claim is fabricated — plausible-sounding but the agent has reason to doubt it. Reject silently or flag explicitly depending on context.

The [unverified] marker creates a human decision point for the grey zone. The agent flags; the human decides.

How to Apply the Tiers¶

Agents follow three rules:

If you can cite it, cite it.
If you believe it but cannot cite it, write it with [unverified] inline and add the claim to an Unverified Claims section at the bottom of the document.
If you fabricated it or have strong reason to doubt it, omit it.

Collecting unverified claims into a dedicated section makes the audit surface visible — an editor scans one section to decide what needs research instead of hunting through prose.

Anti-Patterns¶

Silent inclusion: The agent uses training knowledge as fact without sourcing it. Readers cannot distinguish sourced from unsourced claims. Hallucination surveys consistently categorize this extrinsic hallucination type — outputs unverifiable against any source — as a primary failure mode in agent-generated content.

Silent omission: The agent discards all unsourced knowledge. Correct-but-uncitable information — conventions, tradeoffs, operational patterns — disappears from the output. The document is accurate but thinner than it should be.

Hedging instead of marking: The agent writes "the model might prefer..." or "this could possibly..." instead of [unverified]. Hedges are invisible to editors and do not surface the claim for review.

Why It Works¶

Binary sourced/rejected rules fail because model training knowledge is not uniform — it spans claims the model has seen confirmed across many sources, claims encountered once, and fabrications. Collapsing them into a single "unsourced = rejected" rule discards the first category unnecessarily. Research on LLM knowledge awareness shows models often hold accurate information they cannot trace to a specific document; silent omission throws that signal away.

The second mechanism is audit-surface concentration. Inline hedges like "the model might prefer..." scatter uncertainty throughout the document, forcing an editor to re-read the entire output to find everything requiring verification. The [unverified] tag plus a dedicated collection section converts that scattered uncertainty into a single bounded list — the editor processes one section, not the full document. This mirrors established code-review practice, where linting violations are aggregated into a report rather than surfaced one-by-one during reading.

Example¶

An agent writing a technical summary applies the three tiers inline. The passage below shows Tier 1 (cited), Tier 2 (marked [unverified]), and the resulting Unverified Claims section that an editor audits separately.

## Summary

Claude 3.5 Sonnet achieves a 49% solve rate on SWE-bench Verified
([source](https://www.anthropic.com/news/claude-3-5-sonnet)), making it
the top-performing publicly available model on that benchmark as of June 2024.

The model uses a 200k token context window, which allows it to process
entire large codebases in a single pass [unverified].

Constitutional AI training reduces refusal rates on benign requests
compared to RLHF-only baselines [unverified].

---

## Unverified Claims

- The model uses a 200k token context window, allowing entire large codebases
  in a single pass. [needs citation — check Anthropic docs]
- Constitutional AI training reduces refusal rates on benign requests compared
  to RLHF-only baselines. [needs citation — may be from research paper]

The editor can process the Unverified Claims section in one pass — verifying, citing, or removing each claim — rather than re-reading the full document to find unsourced statements.

When This Backfires¶

The three-tier pattern adds value only when the unverified claims section is actually reviewed:

Unactioned review backlog: If the section is never processed before publication, it ships with the document and exposes unvalidated assertions to readers. The pattern requires an active triage step — it does not self-enforce.
Tagging discipline erodes under pressure: Agents operating under token or time constraints skip [unverified] tagging, collapsing back to silent inclusion.
Tag volume overwhelms the reviewer: Agents that lack calibration mark everything uncertain. A document with 15 unverified claims becomes noise rather than signal; the human stops reading the section.
Tier 2 and Tier 3 are hard to distinguish: An agent that cannot accurately introspect on its own confidence classifies hallucinated claims as unverified rather than rejected, producing a review list that is systematically optimistic.
False confidence from the process itself: Stakeholders may treat the existence of an "Unverified Claims" section as evidence of rigor even when individual entries are never researched.
Low-stakes contexts invert the cost/benefit: For internal drafts or brainstorming outputs, the overhead of tagging and reviewing exceeds the benefit. The pattern is most valuable where accuracy matters more than throughput.

Key Takeaways¶

Binary sourced/rejected rules conflate unverified knowledge with hallucination — the distinction matters.
Mark unverified claims inline with [unverified] rather than omitting or silently including them.
Collect unverified claims in a dedicated section so the audit surface is visible.
Human-in-the-loop for Tier 2: the agent flags, the human decides.