Always-On Agentic PR Security Review¶

Pair a PR-time security reviewer with a scheduled whole-codebase scanner: the reviewer covers new risk in each diff, the scanner resident risk no PR reaches.

Two coverage gaps¶

Security review fails along two temporal axes:

New risk — vulnerabilities in today's changes. Diff-scoped, cheap to review when the pull_request opens, lost if not caught before merge.
Resident risk — vulnerabilities already in the codebase, plus dependency, config, and policy drift. No PR may touch the affected files for months.

PR-only review never finds resident risk. Scheduled-only scanning delays new-code coverage.

The pattern¶

Two agents share one finding format and one triage queue.

graph TD
    PR[PR opened] --> Reviewer[PR-Time Security Reviewer]
    Reviewer --> Inline[Inline diff comments]
    Cron[Cron / cadence] --> Scanner[Scheduled Vulnerability Scanner]
    Scanner --> Channel[Slack / issue tracker]
    Inline --> Triage[(Shared finding queue)]
    Channel --> Triage
    Triage --> Suppress[Suppression rules]
    Suppress --> Reviewer
    Suppress --> Scanner

Dimension	PR-time reviewer	Scheduled scanner
Scope	Diff only	Whole codebase
Trigger	`pull_request` open/sync	Cron, cadence, dependency advisory
Latency budget	Seconds to a few minutes	Hours acceptable
Output	Inline comment at the changed line	Aggregated report to a channel
Failure mode	Block merge or post warning	File issue or notify owner

Cursor shipped this split in beta on 2026-04-30: a 'Security Reviewer' that "checks every PR for security vulnerabilities, auth regressions, privacy and data-handling risks, agent tool auto-approvals, and prompt injection attacks" plus a 'Vulnerability Scanner' that "runs scheduled scans of your codebase to check for known vulnerabilities, outdated dependencies, and configuration issues." [Source: Cursor changelog] Anthropic's claude-code-security-review Action is the convergent PR-time component; /security-review runs the same review locally before commit. [Source: Anthropic Help] GitHub shipped the same component into Copilot CLI on 2026-06-10 as a dedicated security-review command. [Source: GitHub Changelog]

Prompt-injection review is a distinct dimension¶

The reviewer flags injection vectors in the diff — content that, once shipped, will land in another agent's context and rewrite its instructions.

Check class	Looks for	Signal
CVE / dependency	Known-vulnerable package versions	Advisory database
SAST	Tainted data flow to a sink	AST / data-flow graph
Secrets	High-entropy strings, known patterns	Regex + entropy
Prompt-injection	New retrieval paths into agent context, untrusted-input boundaries, system-prompt mutations, tool descriptions, skill `SKILL.md` text	Heuristic + LLM judgment

The attack surface is semantic — the same string is benign in a code comment and dangerous in a runtime-loaded system prompt. Deterministic SAST will not flag a SKILL.md whose ## Examples section contains injected instructions. An LLM reviewer scoped to the Lethal Trifecta and task-scope boundary will. [Source: Prompt Injection Resistant Agent Design]

The reviewer itself is a target¶

A PR-triggered reviewer reading PR titles, descriptions, and comments runs untrusted input through an LLM with repository credentials in scope — the Lethal Trifecta at the reviewer.

The April 2026 'Comment and Control' disclosure exploited this against Claude Code Security Review, Gemini CLI Action, and GitHub Copilot Agent. The attacker injects instructions in a PR title. The agent auto-triggers on pull_request, runs the directive, and exfiltrates ANTHROPIC_API_KEY, GITHUB_TOKEN, and GEMINI_API_KEY as a "security finding" comment. Anthropic rated it CVSS 9.4. [Source: Comment and Control writeup; SecurityWeek; The Register]

Mitigations are structural:

Treat PR title, description, and comments as untrusted data — never as instructions
Avoid pull_request_target for forked PRs unless secrets are scoped via a credentials proxy
Restrict the reviewer's tool catalog to read-only operations on the diff, gating writes and network calls behind confirmation gates
Apply the Action-Selector pattern so the reviewer cannot synthesize arbitrary tool calls from PR text

False-positive economics¶

Single-stage detection is the wrong shape. One observed mitigation pairs a cheap stage-1 filter accepting an 8.5% false-positive rate with a stage-2 reasoning pass that drops it to 0.4%. [Source: ARMO: Detecting Prompt Injection in Production AI Agent Workloads]

Suppression must be first-class. Cursor accepts custom instructions and MCP-wrapped SAST/SCA/secrets scanners so deterministic tools own high-confidence classes and the LLM judges residual semantic surface. Self-improving review agents persist accept/reject signals as rules, narrowing over time.

When the pattern backfires¶

No AppSec triage owner. Findings land on the PR author — without a triage queue, noise leads to dismissal.
Mature deterministic tooling already covers the surface. Tuned CodeQL, Semgrep, Snyk, Dependabot — incremental coverage may not exceed cost.
High-volume, low-security churn. Docs sites, generated code, config-heavy monorepos produce findings the reviewer cannot prioritize.
pull_request_target with secrets. A Comment-and-Control precondition. Fix the trust boundary first.

Example¶

Cursor's beta configuration shows the operational shape end to end:

# Reviewer agent — runs at PR open
trigger: pull_request
scope: diff
checks:
  - vulnerabilities
  - auth_regressions
  - privacy_data_handling
  - agent_tool_auto_approvals
  - prompt_injection
output: inline_review_comment
mcp_servers:
  - sast_scanner
  - sca_scanner
  - secrets_scanner

# Scanner agent — runs on cadence
trigger: schedule
scope: whole_codebase
checks:
  - known_vulnerabilities
  - outdated_dependencies
  - configuration_issues
output: slack_channel

Both draw from a shared usage pool and a shared suppression-rule store. [Source: Cursor Security Review changelog]

Key Takeaways¶

New risk and resident risk need different triggers — one agent cannot cover both economically
Prompt-injection review is a semantic check the LLM does well, not a SAST replacement
The reviewer agent is itself an injection target whenever it processes PR text with credentials in scope; mitigations are structural
The pattern fails on false-positive economics unless suppression rules are first-class

Designing Agents to Resist Prompt Injection
Prompt Injection Threat Model
Lethal Trifecta Threat Model
Scanner-as-MCP-Server
Self-Improving Code Review Agents — Learned Rules
Human-in-the-Loop Confirmation Gates
Action-Selector Pattern
Scoped Credentials via Proxy
Cross-Repository Security Posture for Agent-Introduced Vulnerabilities — the organization-wide layer that enumerates a finding across every repository