Assertion Density — Stats and Quotes Over Vague Claims¶

Replace vague qualifiers with specific numbers, dates, sample sizes, and attributed quotes. The Princeton GEO study found this is the highest-impact single rewrite technique for AI citation rates — up to 41% improvement in source visibility.

Why Specificity Gets Cited¶

AI answer engines use retrieval-augmented generation: they match queries against indexed content and generate answers from retrieved passages. Specific claims improve retrieval in two ways:

Token-level matching — a query for "how much does X improve Y" matches numeric passages more precisely than "significantly" or "substantially"
Attribution confidence — attributed quotes with named credentials and dated statistics are easier to cite verbatim than generalities

The Evidence¶

The Princeton GEO study (Aggarwal et al., KDD 2024) tested 9 optimization techniques against a 10,000-query benchmark (GEO-bench) across 25 domains, measuring source visibility using Position-Adjusted Word Count (PAWC — word count weighted by exponential decay based on citation position):

Technique	PAWC Improvement
Quotation Addition	+41%
Statistics Addition	+30%
Cite Sources	+30%
Fluency Optimization	+15–30%
Keyword Stuffing	–10%

Caveats: All three top techniques add content rather than modifying it — PAWC rewards length, giving content-addition techniques a structural advantage. The study permitted fabricated statistics, limiting real-world applicability (see Sandbox SEO's critique of the methodology). The directional finding — specific over vague — is robust; exact percentages are an upper bound.

What Counts¶

Strong assertions (retrieval-friendly):

Specific numbers with units: "reduces latency by 23ms at p99"
Named sources with credentials: "according to Martin Fowler, author of Refactoring"
Dated research: "a 2024 Stanford study of 1,200 developers found..."
Sample sizes: "across 10,000 queries in 25 domains"
Bounded ranges: "8–12 citations per 1,500 words"

Weak assertions (retrieval-unfriendly):

Vague quantifiers: "many", "often", "most", "significantly"
Unattributed authority: "experts say", "research shows", "it is widely known"
Relative comparisons without anchors: "much faster", "far more accurate"
Undated generalizations: "historically", "in recent years"

Rewrite Guide¶

Find vague qualifiers and replace with specifics. If no source exists for a claim, weaken it to a factually-supportable form or remove it — do not invent statistics or use hedge tags.

Before	After
"Context priming significantly improves output quality."	"Context priming reduces rework — agents that read relevant files before implementing produce output that matches existing conventions, because the retrieved context constrains generation to existing patterns."
"Most developers use AI coding assistants."	"75% of developers surveyed by GitHub in 2024 reported using AI coding tools at least weekly."
"Keyword stuffing is counterproductive."	"Keyword stuffing reduced source visibility by 10% in the Princeton GEO benchmark (Aggarwal et al., KDD 2024)."
"Large context windows help with complex tasks."	"Claude 3.5 Sonnet supports a 200K-token context window, sufficient to load an entire mid-size codebase before implementing."

Unsourceable Claims¶

If a claim cannot be backed by a real source, rewrite it in a weaker factually-supportable form or remove it entirely. Hedge tags produce a false-confidence signal without adding retrieval value — the GEO study found PAWC rewards length and attributed specificity, not vague generalities.

Limits¶

Fabrication risk: manufactured statistics are detectable; only add specifics you can source
Structural prerequisites: if the page buries answers (see Answer-First Writing), assertion density won't compensate for a retrieval miss at the section level
Diminishing returns: past a threshold, additional citations add length without citability

Recency and Assertion Density¶

Content freshness and assertion density are independent citation signals — improving one does not substitute for the other. See Measuring GEO Performance for tracking both.