Skip to content

Always-On Agentic PR Security Review

Pair a PR-time security reviewer with a scheduled whole-codebase scanner: the reviewer covers new risk in each diff, the scanner resident risk no PR reaches.

Two Coverage Gaps

Security review fails along two temporal axes:

  • New risk — vulnerabilities in today's changes. Diff-scoped, cheap to review when the pull_request opens, lost if not caught before merge.
  • Resident risk — vulnerabilities already in the codebase, plus dependency, config, and policy drift. No PR may touch the affected files for months.

PR-only review never finds resident risk; scheduled-only scanning delays new-code coverage.

The Pattern

Two agents share one finding format and one triage queue.

graph TD
    PR[PR opened] --> Reviewer[PR-Time Security Reviewer]
    Reviewer --> Inline[Inline diff comments]
    Cron[Cron / cadence] --> Scanner[Scheduled Vulnerability Scanner]
    Scanner --> Channel[Slack / issue tracker]
    Inline --> Triage[(Shared finding queue)]
    Channel --> Triage
    Triage --> Suppress[Suppression rules]
    Suppress --> Reviewer
    Suppress --> Scanner
Dimension PR-time reviewer Scheduled scanner
Scope Diff only Whole codebase
Trigger pull_request open/sync Cron, cadence, dependency advisory
Latency budget Seconds to a few minutes Hours acceptable
Output Inline comment at the changed line Aggregated report to a channel
Failure mode Block merge or post warning File issue or notify owner

Cursor shipped this split in beta on 2026-04-30: a Security Reviewer that "checks every PR for security vulnerabilities, auth regressions, privacy and data-handling risks, agent tool auto-approvals, and prompt injection attacks" plus a Vulnerability Scanner that "runs scheduled scans of your codebase to check for known vulnerabilities, outdated dependencies, and configuration issues." [Source: Cursor changelog] Anthropic's claude-code-security-review Action is the convergent PR-time component; /security-review runs the same review locally before commit. [Source: Anthropic Help] GitHub shipped the same component into Copilot CLI on 2026-06-10 as a dedicated security-review command. [Source: GitHub Changelog]

Prompt-Injection Review Is a Distinct Dimension

The reviewer flags injection vectors in the diff — content that, once shipped, will land in another agent's context and rewrite its instructions.

Check class Looks for Signal
CVE / dependency Known-vulnerable package versions Advisory database
SAST Tainted data flow to a sink AST / data-flow graph
Secrets High-entropy strings, known patterns Regex + entropy
Prompt-injection New retrieval paths into agent context, untrusted-input boundaries, system-prompt mutations, tool descriptions, skill SKILL.md text Heuristic + LLM judgement

The attack surface is semantic — the same string is benign in a code comment and dangerous in a runtime-loaded system prompt. Deterministic SAST will not flag a SKILL.md whose ## Examples section contains injected instructions; an LLM reviewer scoped to the Lethal Trifecta and task-scope boundary will. [Source: Prompt Injection Resistant Agent Design]

The Reviewer Itself Is a Target

A PR-triggered reviewer reading PR titles, descriptions, and comments runs untrusted input through an LLM with repository credentials in scope — the Lethal Trifecta at the reviewer.

The April 2026 Comment and Control disclosure exploited this against Claude Code Security Review, Gemini CLI Action, and GitHub Copilot Agent. The attacker injects instructions in a PR title; the agent auto-triggers on pull_request, runs the directive, and exfiltrates ANTHROPIC_API_KEY, GITHUB_TOKEN, and GEMINI_API_KEY as a "security finding" comment. Anthropic rated it CVSS 9.4. [Source: Comment and Control writeup; SecurityWeek; The Register]

Mitigations are structural:

  • Treat PR title, description, and comments as untrusted data — never as instructions
  • Avoid pull_request_target for forked PRs unless secrets are scoped via a credentials proxy
  • Restrict the reviewer's tool catalog to read-only operations on the diff; gate writes and network calls behind confirmation gates
  • Apply the Action-Selector pattern so the reviewer cannot synthesise arbitrary tool calls from PR text

False-Positive Economics

Single-stage detection is the wrong shape. One observed mitigation pairs a cheap stage-1 filter accepting an 8.5% false-positive rate with a stage-2 reasoning pass that drops it to 0.4%. [Source: ARMO: Detecting Prompt Injection in Production AI Agent Workloads]

Suppression must be first-class. Cursor accepts custom instructions and MCP-wrapped SAST/SCA/secrets scanners so deterministic tools own high-confidence classes and the LLM judges residual semantic surface. Self-improving review agents persist accept/reject signals as rules, narrowing over time.

When the Pattern Backfires

  • No AppSec triage owner. Findings land on the PR author; without a triage queue, noise leads to dismissal.
  • Mature deterministic tooling already covers the surface. Tuned CodeQL, Semgrep, Snyk, Dependabot — incremental coverage may not exceed cost.
  • High-volume, low-security churn. Docs sites, generated code, config-heavy monorepos produce findings the reviewer cannot prioritise.
  • pull_request_target with secrets. A Comment-and-Control precondition; fix the trust boundary first.

Example

Cursor's beta configuration shows the operational shape end to end:

# Reviewer agent — runs at PR open
trigger: pull_request
scope: diff
checks:
  - vulnerabilities
  - auth_regressions
  - privacy_data_handling
  - agent_tool_auto_approvals
  - prompt_injection
output: inline_review_comment
mcp_servers:
  - sast_scanner
  - sca_scanner
  - secrets_scanner

# Scanner agent — runs on cadence
trigger: schedule
scope: whole_codebase
checks:
  - known_vulnerabilities
  - outdated_dependencies
  - configuration_issues
output: slack_channel

Both draw from a shared usage pool and a shared suppression-rule store. [Source: Cursor Security Review changelog]

Key Takeaways

  • New risk and resident risk need different triggers — one agent cannot cover both economically
  • Prompt-injection review is a semantic check the LLM does well, not a SAST replacement
  • The reviewer agent is itself an injection target whenever it processes PR text with credentials in scope; mitigations are structural
  • The pattern fails on false-positive economics unless suppression rules are first-class
Feedback