Signal Over Volume in AI Review¶

Design AI code review to stay silent when it has nothing useful to say — high-signal feedback builds trust; exhaustive commenting destroys it.

The Principle¶

AI review tools that always produce output regardless of value train you to ignore them. The signal-over-volume principle treats silence as a valid review outcome. When the AI does comment, it matters. When it has nothing high-confidence to add, it says nothing — the same silent-drop discipline a reproduce-before-report gate enforces.

GitHub's Copilot code review demonstrates this at scale: in 71% of reviews, Copilot surfaces actionable feedback; in the remaining 29%, the agent says nothing at all. GitHub explicitly rejected maximizing comment frequency, stating "more comments don't necessarily mean a better review."

Why Volume Fails¶

Alert fatigue is the primary failure mode. When every PR gets a wall of comments — style nits, suggestions on intentional patterns, low-confidence speculation — you stop reading AI review output entirely. The one critical security finding gets buried in twenty stylistic preferences.

The pressure intensifies as agents author more PRs: Linear describes keeping the review quality bar high under the higher PR throughput agents generate, treating volume as a reason to tighten the signal bar rather than relax it.

Designing for Signal¶

Silence as Output¶

Build review agents that return no comments when confidence is low. This requires a confidence threshold: each potential finding must clear a minimum signal bar (the Example below uses a ≥90% floor) before surfacing. Findings below it are suppressed, not queued.

Multi-Line Contextual Comments¶

Single-line comments that point to one line of code without surrounding context force you to reconstruct the problem. GitHub's Copilot code review addresses this by attaching feedback to logical code ranges.

Clustered Feedback¶

When the same pattern error appears across multiple locations, individual comments for each instance create noise. Instead, cluster them into a single cohesive unit that identifies the pattern once and lists all affected locations. This reduces cognitive load.

Batch Autofixes¶

When multiple instances of the same issue are identified, offer batch fixes that resolve an entire class of issues at once rather than applying each fix individually.

Measuring Signal Quality¶

Two feedback loops validate signal quality:

Reactions — thumbs-up/down on individual comments track whether suggestions prove helpful. A declining ratio indicates signal degradation.
Resolution tracking — whether flagged issues get resolved before merging. Findings you consistently dismiss indicate false positives that learned review rules should suppress.

GitHub's agentic architecture redesign produced an 8.1% increase in positive feedback by improving signal quality. A later, separate move to a stronger reasoning model added a further 6% — despite review latency rising 16% — evidence that fewer, better comments beat faster, noisier ones.

Applying the Pattern¶

When building or configuring AI review:

Set a confidence floor. Only surface findings the model is confident about. Low-confidence suggestions belong in optional "info" channels, not the PR thread.
Categorize by severity. Critical and high findings appear as PR comments. Medium and low findings surface only when explicitly requested, the routing tiered code review formalises.
Track false positive rates. If you dismiss a category of finding more than half the time, suppress it or refine its detection criteria.
Scope review instructions. Tell the agent what to check and — equally important — what to ignore. A review prompt that says "flag all uses of any" will flag intentional uses alongside accidental ones.

Why It Works¶

The mechanism is attentional: reviewers have a fixed budget of attention per PR. When a tool produces many low-value comments, reviewers discount all its output — including the high-value findings, the review-fatigue dynamic that erodes sustainable agent use. This is a learned response to repeated false positives, not a deliberate choice. Suppressing low-confidence findings preserves attention for the comments that do surface, so each one is read rather than skimmed.

When This Backfires¶

Cross-file false negatives. A strict confidence floor silences bugs that span multiple files — the same cross-file blind spot diff-based review carries when context is missing. This defect class is missed unless the agent receives sufficient scope.
Silent failure on novel patterns. Confidence thresholds reflect known patterns. A new vulnerability type may score low confidence because it is rare in training data, not because it is low risk. The agent's silence is indistinguishable from a clean bill of health — an empirical evaluation of Copilot code review on labelled vulnerable samples found it frequently misses SQL injection, XSS, and insecure deserialization while still returning clean reviews.
Trust inversion. When the agent comments rarely, developers may interpret silence as implicit approval and reduce manual review. A No high-confidence findings. response creates false completeness if secondary review has been dropped.
Threshold decay. Confidence floors drift as codebases evolve. Without periodic recalibration against resolved findings, signal quality degrades silently.

Example¶

The following Claude prompt configures a code review agent to apply the signal-over-volume principle: it sets a confidence floor, categorises by severity, and explicitly instructs silence when nothing high-value is found.

You are a code reviewer. Review the git diff provided.

Rules:
- Only comment on findings you are highly confident about (≥90% confidence).
  If you have nothing high-confidence to say, respond with exactly: "No high-confidence findings."
- Categorise every finding as CRITICAL, HIGH, MEDIUM, or LOW.
- Only surface CRITICAL and HIGH findings as PR comments.
  MEDIUM and LOW findings: omit them entirely unless the user asks for a full review.
- When the same issue appears in multiple locations, write ONE comment that lists all affected lines.
  Do not write a separate comment for each instance.
- Attach each comment to the full logical block it concerns (function or method), not to a single line.
- Do not comment on formatting, naming conventions, or style unless you also see a correctness risk.

Output format for each finding:
[SEVERITY] <one-line summary>
Lines: <file>:<start>-<end>
Issue: <what is wrong and why it matters>
Fix: <concrete code change>

A PR that receives a response of "No high-confidence findings." passes the bar. A PR that receives one [CRITICAL] comment about an SQL injection risk gets immediate attention precisely because the agent stayed silent on everything else.

Key Takeaways¶

Silence is a valid review output — 29% of Copilot code reviews intentionally produce no comments
Alert fatigue from noisy AI review trains you to ignore all AI feedback, including critical findings
Attach feedback to logical code ranges, not individual lines, so you see full context
Cluster repeated pattern errors into a single finding to reduce cognitive load
Measure signal quality through reactions and issue resolution rates, not comment volume

Agent-Assisted Code Review
Agentic Code Review Architecture
Tiered Code Review
Tunable Review Effort — why High-by-default backfires; the per-PR effort lever that complements signal-over-volume
Human-AI Review Synergy — complementary strengths of AI and human reviewers and how to structure collaboration
CRA-Only Review and the Merge Rate Gap — empirical signal ratio data showing how actionable comment rates determine merge outcomes
Cognitive Load, AI Fatigue, and Sustainable Agent Use — cognitive costs of review fatigue and how to manage them sustainably
Self-Improving Code Review Agents — Learned Rules — how agents can persist accept/reject signals to suppress recurring false positives automatically