Effort-Aware Hooks: Reading the Reasoning Tier from PreToolUse and PostToolUse¶
Claude Code v2.1.133 exposes the active effort level as a first-class hook input, so deterministic gates can branch on the reasoning tier without parsing transcripts.
The Signal¶
Claude Code v2.1.133, May 7, 2026 added two equivalent surfaces to every hook: an effort object in the JSON stdin payload, and a $CLAUDE_EFFORT environment variable (also set for Bash tool commands).
The hooks reference documents effort as an object with a level field carrying the active tier — "low", "medium", "high", "xhigh", or "max" — present on tool-use events including PreToolUse, PostToolUse, Stop, and SubagentStop. Two properties matter. The level is the downgraded level — when requested effort exceeds what the active model supports, the field reports what actually ran. And the field is absent on session-level events outside a tool-use context, so a UserPromptSubmit hook cannot read it.
Asymmetry Rule¶
A tier-aware hook is safe when it gets stricter at higher tiers and dangerous when it gets weaker at lower tiers. A security gate that relaxes deny-rules at low is a gate an attacker bypasses by running cheaper. The Anthropic effort docs reinforce why: "Effort is a behavioral signal, not a strict token budget" — the level reflects what the model will spend, not what an invariant requires.
| Direction | Safe? | Example |
|---|---|---|
| Stricter at higher tiers | Yes | Require eval-suite pass on xhigh |
| Looser at lower tiers | No (security) | Skip secret scan on low |
| Different threshold per tier | Conditional | Loop-detector limits scaled to call volume |
| Pure observability per tier | Yes | Tag every metric with effort.level |
Three Concrete Uses¶
Loop-Detector Threshold Scaling¶
The bootstrap loop detector hook counts per-file edits at PostToolUse and escalates from nudge → pause → block. Lower-effort runs naturally make fewer tool calls — the effort docs note that lower levels "combine multiple operations into fewer tool calls" and "make fewer tool calls." A static threshold tuned for xhigh produces false negatives at low. The fix is to scale the threshold by tier (read $CLAUDE_EFFORT, pick from a table) — never to disable the detector.
Pre-Completion Check Selection¶
The bootstrap pre-completion hook runs deterministic checks at Stop. Effort-aware variants require a stricter checklist when the operator paid for a stricter run: every tier runs the standard set (lint, build, test); high adds a slow check; xhigh / max add the slowest. Higher tiers add; no tier subtracts.
Telemetry Labelling¶
The cheapest pattern is observational. Emit effort.level as a label on every hook-side metric and trace span (see agent observability with OpenTelemetry). Cost dashboards then filter by tier without inferring it from transcript content. The hook never blocks differently.
Calibration Drift¶
The default effort level changes silently across versions. As of v2.1.117, the default is xhigh on Opus 4.7 and high on Opus 4.6 / Sonnet 4.6. A hook with thresholds calibrated against high-tier traffic on Opus 4.6 now applies to xhigh-tier traffic on Opus 4.7 by default, with no project setting changed. Branch on effort.level directly rather than on which model is configured, and re-calibrate tier-shaped thresholds when the Claude Code version is bumped.
The Claude Code docs are explicit about cross-model semantics: "The effort scale is calibrated per model, so the same level name does not represent the same underlying value across models". A hook that branches identically on level === "high" against Sonnet 4.6 and Opus 4.7 is treating two different things as one.
Tool-Agnostic Restatement¶
The structural primitive — the harness exposes its reasoning-tier selection to deterministic gates — generalizes beyond Claude Code. The Anthropic API's effort parameter carries the same five levels at the SDK boundary. When a harness lacks a structural signal, the only fallback is to infer tier from prompt content — forfeiting the determinism that makes hooks worth running.
When This Backfires¶
- Tier-induced flakiness. "The same change passes on
lowand fails onhigh" turns the cheap signal into noise. Keep gates uniform; vary what tasks route to which tier at the agent or model-routing level. - Multi-model fleets. Read both
effort.leveland the model identifier when thresholds depend on absolute capability — the same level name has different semantics across models. - Inferred-tier security gates. Security gates ignore
effortentirely, or get stricter — never weaker.
Example¶
A PostToolUse loop detector that scales its block threshold by tier. The script reads $CLAUDE_EFFORT (set by Claude Code v2.1.133+), falls back to high when the variable is absent (older versions, non-tool events), and never lowers the threshold below the floor — the asymmetry rule applied to a counter.
#!/usr/bin/env bash
# .claude/hooks/loop-detector.sh
# PostToolUse — scale block threshold by reasoning tier.
set -euo pipefail
# Floor: never block before this many edits, regardless of tier.
FLOOR=5
case "${CLAUDE_EFFORT:-high}" in
low) BLOCK_AFTER=$FLOOR ;;
medium) BLOCK_AFTER=6 ;;
high) BLOCK_AFTER=8 ;;
xhigh) BLOCK_AFTER=12 ;;
max) BLOCK_AFTER=16 ;;
*) BLOCK_AFTER=8 ;;
esac
# ... existing per-file edit-count logic, comparing against $BLOCK_AFTER
The threshold scales upward with tier (more thinking → more legitimate exploration → higher tolerance for repeat edits) but never drops below the floor. A low-tier run gets the same hard floor as high; an xhigh run gets headroom. The block decision still fires deterministically — only the threshold is tier-shaped.
Key Takeaways¶
effort.levelis a first-class hook input in Claude Code v2.1.133+, available onPreToolUse,PostToolUse,Stop, andSubagentStop, and as$CLAUDE_EFFORTfor hook and Bash subprocesses.- The level is the downgraded level (what actually ran), not the requested level — useful for accuracy, but means the same code path runs against different upstream requests.
- Asymmetry rule: tier-aware hooks may add strictness at higher tiers; they must not subtract it at lower ones, especially for security gates.
- Default-tier changes between Claude Code versions silently move which branch fires most often — re-calibrate tier-shaped thresholds on every harness upgrade.
- The same level name has different semantics across models; pair
effort.levelwith the model identifier when thresholds depend on absolute capability rather than relative tier.