Schema and Structured Data for GEO¶
Structured data lifts AI citation rates by pre-packaging content in the Q&A and step formats engines reuse — studies report 2.7x–3.2x FAQPage gains.
Independent studies report FAQPage citation improvements of 2.7x to 3.2x (Frase.io; DEV Community) because structured data reduces extraction effort during indexing. Schema's primary value has shifted from SEO to AI citation — ChatGPT, Perplexity, Gemini, and Claude process it at indexing time. This site auto-injects Article, FAQPage, HowTo, DefinedTerm, and BreadcrumbList schemas via hooks/structured_data.py, plus a site-wide DefinedTermSet on the concepts glossary.
What Changed: Google vs. AI Search¶
| Channel | FAQPage / HowTo Rich Results | Schema Citation Value |
|---|---|---|
| Google Search (classic) | Restricted to government/health sites since Aug 2023 | Low for most dev docs |
| Google AI Overviews | Processed at index time | High — 3.2x appearance lift (Frase.io) |
| ChatGPT | Not rendered live; indexed content used | High — favours Q&A format |
| Perplexity | Indexed schema aids entity disambiguation | High — citation footnotes |
| Gemini | Renders JavaScript; processes schema | High |
Key nuance: chatbots don't read JSON-LD on live fetch — benefit accrues at indexing and training.
The Three Schema Types¶
FAQPage¶
Structures Q&A blocks for direct AI extraction. Keep answers 40–80 words — standalone, citable length.
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "What is an agent harness?",
"acceptedAnswer": {
"@type": "Answer",
"text": "An agent harness is scaffolding that surrounds an AI agent loop — managing context, tool calls, error recovery, and output formatting. It separates infrastructure concerns from reasoning logic."
}
}]
}
The hook detects ## FAQ or ## Frequently Asked Questions followed by **Question** / answer pairs and emits this schema automatically.
HowTo¶
Converts numbered step lists into extractable blocks — each step is a quotable unit.
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "How to configure prompt caching",
"step": [
{ "@type": "HowToStep", "position": 1, "text": "Enable the caching header in your API request." },
{ "@type": "HowToStep", "position": 2, "text": "Place stable content at the top of the context window." }
]
}
Auto-detection triggers on ordered lists (<ol>) with 3+ items, restricted to patterns/ and techniques/ paths. Extend _HOWTO_PATHS in hooks/structured_data.py to widen coverage.
DefinedTerm¶
Machine-readable definitions for named concepts — useful where terms like "agent" are ambiguous across tools. Every coined-concept leaf page emits one DefinedTerm, anchored to a single DefinedTermSet on /concepts/:
{
"@context": "https://schema.org",
"@type": "DefinedTerm",
"name": "Harness Engineering",
"description": "The discipline of designing agent environments so agents reliably produce correct results.",
"alternateName": ["agent environment design", "environment design for agents"],
"url": "https://agentpatterns.ai/agent-design/harness-engineering/",
"inDefinedTermSet": "https://agentpatterns.ai/concepts/"
}
The hook maps frontmatter to schema fields automatically:
DefinedTerm field |
Source |
|---|---|
name |
term: frontmatter — the clean moniker — falling back to title: (SEO-shaped) |
description |
description: frontmatter (the page's one-line definition) |
alternateName |
aliases: frontmatter — the page's strongest vocabulary-ownership signal |
url / inDefinedTermSet |
canonical page URL / the /concepts/ glossary |
Gated to coined-concept sections (_DEFINED_TERM_PATHS) plus _DEFINED_TERM_ALLOW exceptions; articles/, tools/, and training/ are excluded — they describe terms rather than coin them.
How This Site Generates Schema¶
The hook runs at on_post_page and injects JSON-LD into every page's <head>:
graph LR
A[MkDocs builds page] --> B[on_post_page hook fires]
B --> C[Organization schema]
B --> D{Is homepage?}
D -- yes --> E[WebSite schema]
B --> F[Article schema]
B --> G[BreadcrumbList schema]
B --> H{FAQ heading detected?}
H -- yes --> I[FAQPage schema]
B --> J{Ordered list 3+ steps and patterns/ or techniques/ path?}
J -- yes --> K[HowTo schema]
B --> M{Coined-concept section?}
M -- yes --> N[DefinedTerm schema]
C & E & F & G & I & K & N --> L[Inject before head close]
No per-page config — add an FAQ section and schema appears; a coined-concept page emits a DefinedTerm with no config at all (and a cleaner one if you set term:).
Writing for Schema Auto-Detection¶
To trigger each schema type, write in the shape the hook detects:
- FAQPage — a
## FAQ(or## Frequently Asked Questions) heading with**Question?**lines followed by paragraph answers. Keep answers 40–80 words and standalone. - HowTo — an ordered list of 3+ steps under
patterns/ortechniques/; write each step as a self-contained sentence, since it is extracted as a standaloneHowToStep.text. - DefinedTerm — nothing in the body; set
term:to the bare moniker whentitle:is a headline (elsenamefalls back to the SEO title), keepdescription:a one-line definition, and list every searchable alternate name underaliases:.
When This Backfires¶
Schema lifts citation rates in aggregate, but fails under specific conditions:
- Stale schema after edits — if body text drifts from the auto-generated schema (e.g., FAQ answers edited outside the
**Question**/ answer format), engines see contradictory signals and may deprioritize the page. - Thin or low-authority domains — lift is relative to baseline authority. Schema accelerates the existing topical-authority signal; it doesn't manufacture credibility.
- Wrong type for content shape — HowTo on conceptual explanations, or FAQPage on unrelated Q&A, causes schema–content mismatch that validators flag and engines may penalise.
- Indexing pipeline changes — benefit accrues at indexing time; if a provider downweights structured data, pages relying on the lift lose it with no on-page change.
Testing Schema¶
| Tool | Purpose | URL |
|---|---|---|
| Google Rich Results Test | Validates Google-supported rich results (Article, BreadcrumbList) | https://search.google.com/test/rich-results |
| Schema Markup Validator | Validates all schema.org types without Google restrictions | https://validator.schema.org/ |
| Google Search Console | Monitors rich result impressions and errors post-deployment | https://search.google.com/search-console |
Run locally:
mkdocs build --strict
# Paste a built page's <head> into the Schema Markup Validator
Sources¶
- FAQPage Structured Data — Google Search Central — spec and eligibility
- DefinedTerm — Schema.org — official spec
- DefinedTermSet for Industry Terminology — DEV — fragment
@idand TermSet linking - Schema.org Is Your Secret Weapon for AI Citations — DEV — FAQPage +45%, HowTo +38%
- FAQ Schema for AI Search, GEO and AEO — Frase.io — 3.2x AI Overview lift
- Schema Markup and AI in 2025 — Searchviu — JSON-LD ignored on live fetch; benefits at indexing
- Structured Data in MkDocs — MkDocs Material approach
- Structured Data for SEO and GEO — Digidop — GPT-4 accuracy 16% → 54%
Key Takeaways¶
- Schema's payoff is now AI citation, not Google rich results — benefit accrues at indexing time, not on live fetch.
- This site auto-injects FAQPage, HowTo, and DefinedTerm schema from page structure and frontmatter — no per-page config.
- Match the schema type to content shape; mismatched types (HowTo on prose, stale FAQ answers) can deprioritize a page.
Related¶
- GEO for Technical Docs — schema type selection checklist and per-format GEO priorities
- How AI Engines Cite — citation mechanisms schema markup targets
- Answer-First Writing — content structure that complements schema auto-detection
- SEO vs GEO — how structured data signals differ between traditional SEO and AI citation optimization
- llms.txt — complementary machine-readable format for AI discoverability
- AI Crawler Policy — controlling which crawlers index your structured data
- Measuring GEO Performance — tracking schema citation lift
- What Is GEO — foundational concepts behind generative engine optimization