Skip to content

Schema and Structured Data for GEO

Structured data lifts AI citation rates by pre-packaging content in the Q&A and step formats engines reuse — studies report 2.7x–3.2x FAQPage gains.

Independent studies report FAQPage citation improvements of 2.7x to 3.2x (Frase.io; DEV Community) because structured data reduces extraction effort during indexing. Schema's primary value has shifted from SEO to AI citation — ChatGPT, Perplexity, Gemini, and Claude process it at indexing time. This site auto-injects Article, FAQPage, HowTo, DefinedTerm, and BreadcrumbList schemas via hooks/structured_data.py, plus a site-wide DefinedTermSet on the concepts glossary.

Channel FAQPage / HowTo Rich Results Schema Citation Value
Google Search (classic) Restricted to government/health sites since Aug 2023 Low for most dev docs
Google AI Overviews Processed at index time High — 3.2x appearance lift (Frase.io)
ChatGPT Not rendered live; indexed content used High — favours Q&A format
Perplexity Indexed schema aids entity disambiguation High — citation footnotes
Gemini Renders JavaScript; processes schema High

Key nuance: chatbots don't read JSON-LD on live fetch — benefit accrues at indexing and training.

The Three Schema Types

FAQPage

Structures Q&A blocks for direct AI extraction. Keep answers 40–80 words — standalone, citable length.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is an agent harness?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "An agent harness is scaffolding that surrounds an AI agent loop — managing context, tool calls, error recovery, and output formatting. It separates infrastructure concerns from reasoning logic."
    }
  }]
}

The hook detects ## FAQ or ## Frequently Asked Questions followed by **Question** / answer pairs and emits this schema automatically.

HowTo

Converts numbered step lists into extractable blocks — each step is a quotable unit.

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to configure prompt caching",
  "step": [
    { "@type": "HowToStep", "position": 1, "text": "Enable the caching header in your API request." },
    { "@type": "HowToStep", "position": 2, "text": "Place stable content at the top of the context window." }
  ]
}

Auto-detection triggers on ordered lists (<ol>) with 3+ items, restricted to patterns/ and techniques/ paths. Extend _HOWTO_PATHS in hooks/structured_data.py to widen coverage.

DefinedTerm

Machine-readable definitions for named concepts — useful where terms like "agent" are ambiguous across tools. Every coined-concept leaf page emits one DefinedTerm, anchored to a single DefinedTermSet on /concepts/:

{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Harness Engineering",
  "description": "The discipline of designing agent environments so agents reliably produce correct results.",
  "alternateName": ["agent environment design", "environment design for agents"],
  "url": "https://agentpatterns.ai/agent-design/harness-engineering/",
  "inDefinedTermSet": "https://agentpatterns.ai/concepts/"
}

The hook maps frontmatter to schema fields automatically:

DefinedTerm field Source
name term: frontmatter — the clean moniker — falling back to title: (SEO-shaped)
description description: frontmatter (the page's one-line definition)
alternateName aliases: frontmatter — the page's strongest vocabulary-ownership signal
url / inDefinedTermSet canonical page URL / the /concepts/ glossary

Gated to coined-concept sections (_DEFINED_TERM_PATHS) plus _DEFINED_TERM_ALLOW exceptions; articles/, tools/, and training/ are excluded — they describe terms rather than coin them.

How This Site Generates Schema

The hook runs at on_post_page and injects JSON-LD into every page's <head>:

graph LR
    A[MkDocs builds page] --> B[on_post_page hook fires]
    B --> C[Organization schema]
    B --> D{Is homepage?}
    D -- yes --> E[WebSite schema]
    B --> F[Article schema]
    B --> G[BreadcrumbList schema]
    B --> H{FAQ heading detected?}
    H -- yes --> I[FAQPage schema]
    B --> J{Ordered list 3+ steps and patterns/ or techniques/ path?}
    J -- yes --> K[HowTo schema]
    B --> M{Coined-concept section?}
    M -- yes --> N[DefinedTerm schema]
    C & E & F & G & I & K & N --> L[Inject before head close]

No per-page config — add an FAQ section and schema appears; a coined-concept page emits a DefinedTerm with no config at all (and a cleaner one if you set term:).

Writing for Schema Auto-Detection

To trigger each schema type, write in the shape the hook detects:

  • FAQPage — a ## FAQ (or ## Frequently Asked Questions) heading with **Question?** lines followed by paragraph answers. Keep answers 40–80 words and standalone.
  • HowTo — an ordered list of 3+ steps under patterns/ or techniques/; write each step as a self-contained sentence, since it is extracted as a standalone HowToStep.text.
  • DefinedTerm — nothing in the body; set term: to the bare moniker when title: is a headline (else name falls back to the SEO title), keep description: a one-line definition, and list every searchable alternate name under aliases:.

When This Backfires

Schema lifts citation rates in aggregate, but fails under specific conditions:

  • Stale schema after edits — if body text drifts from the auto-generated schema (e.g., FAQ answers edited outside the **Question** / answer format), engines see contradictory signals and may deprioritize the page.
  • Thin or low-authority domains — lift is relative to baseline authority. Schema accelerates the existing topical-authority signal; it doesn't manufacture credibility.
  • Wrong type for content shape — HowTo on conceptual explanations, or FAQPage on unrelated Q&A, causes schema–content mismatch that validators flag and engines may penalise.
  • Indexing pipeline changes — benefit accrues at indexing time; if a provider downweights structured data, pages relying on the lift lose it with no on-page change.

Testing Schema

Tool Purpose URL
Google Rich Results Test Validates Google-supported rich results (Article, BreadcrumbList) https://search.google.com/test/rich-results
Schema Markup Validator Validates all schema.org types without Google restrictions https://validator.schema.org/
Google Search Console Monitors rich result impressions and errors post-deployment https://search.google.com/search-console

Run locally:

mkdocs build --strict
# Paste a built page's <head> into the Schema Markup Validator

Sources

Key Takeaways

  • Schema's payoff is now AI citation, not Google rich results — benefit accrues at indexing time, not on live fetch.
  • This site auto-injects FAQPage, HowTo, and DefinedTerm schema from page structure and frontmatter — no per-page config.
  • Match the schema type to content shape; mismatched types (HowTo on prose, stale FAQ answers) can deprioritize a page.
Feedback