Schema and Structured Data for GEO¶

Structured data lifts AI citation rates by pre-packaging content in the Q&A and step formats engines reuse — studies report 2.7x–3.2x FAQPage gains.

Independent studies report FAQPage citation improvements of 2.7x to 3.2x (Frase.io; DEV Community) because structured data reduces extraction effort during indexing. Schema's primary value has shifted from SEO to AI citation — ChatGPT, Perplexity, Gemini, and Claude process it at indexing time. This site auto-injects Article, FAQPage, HowTo, DefinedTerm, and BreadcrumbList schemas via hooks/structured_data.py, plus a site-wide DefinedTermSet on the concepts glossary.

What Changed: Google vs. AI Search¶

Channel	FAQPage / HowTo Rich Results	Schema Citation Value
Google Search (classic)	Restricted to government/health sites since Aug 2023	Low for most dev docs
Google AI Overviews	Processed at index time	High — 3.2x appearance lift (Frase.io)
ChatGPT	Not rendered live; indexed content used	High — favours Q&A format
Perplexity	Indexed schema aids entity disambiguation	High — citation footnotes
Gemini	Renders JavaScript; processes schema	High

Key nuance: chatbots don't read JSON-LD on live fetch — benefit accrues at indexing and training.

The Three Schema Types¶

FAQPage¶

Structures Q&A blocks for direct AI extraction. Keep answers 40–80 words — standalone, citable length.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is an agent harness?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "An agent harness is scaffolding that surrounds an AI agent loop — managing context, tool calls, error recovery, and output formatting. It separates infrastructure concerns from reasoning logic."
    }
  }]
}

The hook detects ## FAQ or ## Frequently Asked Questions followed by **Question** / answer pairs and emits this schema automatically.

HowTo¶

Converts numbered step lists into extractable blocks — each step is a quotable unit.

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to configure prompt caching",
  "step": [
    { "@type": "HowToStep", "position": 1, "text": "Enable the caching header in your API request." },
    { "@type": "HowToStep", "position": 2, "text": "Place stable content at the top of the context window." }
  ]
}

Auto-detection triggers on ordered lists (<ol>) with 3+ items, restricted to patterns/ and techniques/ paths. Extend _HOWTO_PATHS in hooks/structured_data.py to widen coverage.

DefinedTerm¶

Machine-readable definitions for named concepts — useful where terms like "agent" are ambiguous across tools. Every coined-concept leaf page emits one DefinedTerm, anchored to a single DefinedTermSet on /concepts/:

{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Harness Engineering",
  "description": "The discipline of designing agent environments so agents reliably produce correct results.",
  "alternateName": ["agent environment design", "environment design for agents"],
  "url": "https://agentpatterns.ai/agent-design/harness-engineering/",
  "inDefinedTermSet": "https://agentpatterns.ai/concepts/"
}

The hook maps frontmatter to schema fields automatically:

`DefinedTerm` field	Source
`name`	`term:` frontmatter — the clean moniker — falling back to `title:` (SEO-shaped)
`description`	`description:` frontmatter (the page's one-line definition)
`alternateName`	`aliases:` frontmatter — the page's strongest vocabulary-ownership signal
`url` / `inDefinedTermSet`	canonical page URL / the `/concepts/` glossary

Gated to coined-concept sections (_DEFINED_TERM_PATHS) plus _DEFINED_TERM_ALLOW exceptions; articles/, tools/, and training/ are excluded — they describe terms rather than coin them.

How This Site Generates Schema¶

The hook runs at on_post_page and injects JSON-LD into every page's <head>:

graph LR
    A[MkDocs builds page] --> B[on_post_page hook fires]
    B --> C[Organization schema]
    B --> D{Is homepage?}
    D -- yes --> E[WebSite schema]
    B --> F[Article schema]
    B --> G[BreadcrumbList schema]
    B --> H{FAQ heading detected?}
    H -- yes --> I[FAQPage schema]
    B --> J{Ordered list 3+ steps and patterns/ or techniques/ path?}
    J -- yes --> K[HowTo schema]
    B --> M{Coined-concept section?}
    M -- yes --> N[DefinedTerm schema]
    C & E & F & G & I & K & N --> L[Inject before head close]

No per-page config — add an FAQ section and schema appears; a coined-concept page emits a DefinedTerm with no config at all (and a cleaner one if you set term:).

Writing for Schema Auto-Detection¶

To trigger each schema type, write in the shape the hook detects:

FAQPage — a ## FAQ (or ## Frequently Asked Questions) heading with **Question?** lines followed by paragraph answers. Keep answers 40–80 words and standalone.
HowTo — an ordered list of 3+ steps under patterns/ or techniques/; write each step as a self-contained sentence, since it is extracted as a standalone HowToStep.text.
DefinedTerm — nothing in the body; set term: to the bare moniker when title: is a headline (else name falls back to the SEO title), keep description: a one-line definition, and list every searchable alternate name under aliases:.

When This Backfires¶

Schema lifts citation rates in aggregate, but fails under specific conditions:

Stale schema after edits — if body text drifts from the auto-generated schema (e.g., FAQ answers edited outside the **Question** / answer format), engines see contradictory signals and may deprioritize the page.
Thin or low-authority domains — lift is relative to baseline authority. Schema accelerates the existing topical-authority signal; it doesn't manufacture credibility.
Wrong type for content shape — HowTo on conceptual explanations, or FAQPage on unrelated Q&A, causes schema–content mismatch that validators flag and engines may penalise.
Indexing pipeline changes — benefit accrues at indexing time; if a provider downweights structured data, pages relying on the lift lose it with no on-page change.

Testing Schema¶

Tool	Purpose	URL
Google Rich Results Test	Validates Google-supported rich results (Article, BreadcrumbList)	https://search.google.com/test/rich-results
Schema Markup Validator	Validates all schema.org types without Google restrictions	https://validator.schema.org/
Google Search Console	Monitors rich result impressions and errors post-deployment	https://search.google.com/search-console

Run locally:

mkdocs build --strict
# Paste a built page's <head> into the Schema Markup Validator

Sources¶

FAQPage Structured Data — Google Search Central — spec and eligibility
DefinedTerm — Schema.org — official spec
DefinedTermSet for Industry Terminology — DEV — fragment @id and TermSet linking
Schema.org Is Your Secret Weapon for AI Citations — DEV — FAQPage +45%, HowTo +38%
FAQ Schema for AI Search, GEO and AEO — Frase.io — 3.2x AI Overview lift
Schema Markup and AI in 2025 — Searchviu — JSON-LD ignored on live fetch; benefits at indexing
Structured Data in MkDocs — MkDocs Material approach
Structured Data for SEO and GEO — Digidop — GPT-4 accuracy 16% → 54%

Key Takeaways¶

Schema's payoff is now AI citation, not Google rich results — benefit accrues at indexing time, not on live fetch.
This site auto-injects FAQPage, HowTo, and DefinedTerm schema from page structure and frontmatter — no per-page config.
Match the schema type to content shape; mismatched types (HowTo on prose, stale FAQ answers) can deprioritize a page.

GEO for Technical Docs — schema type selection checklist and per-format GEO priorities
How AI Engines Cite — citation mechanisms schema markup targets
Answer-First Writing — content structure that complements schema auto-detection
SEO vs GEO — how structured data signals differ between traditional SEO and AI citation optimization
llms.txt — complementary machine-readable format for AI discoverability
AI Crawler Policy — controlling which crawlers index your structured data
Measuring GEO Performance — tracking schema citation lift
What Is GEO — foundational concepts behind generative engine optimization