Security Drift in Iterative LLM Code Refinement¶
Each iteration of an LLM-driven fix-test loop can silently accumulate security regressions even as functional tests keep passing.
The Divergence Problem¶
Iterative refinement loops — where an agent fixes a bug, runs tests, and repeats — optimize for functional correctness. Security correctness is a separate dimension that functional tests do not measure. Over multiple iterations, the two can diverge: working code accumulates attack surfaces that no test ever exercises.
SCAFFOLD-CEGIS demonstrates this empirically. LLM-driven iterative refinement passes functional benchmarks while introducing latent security regressions. The pattern is systematic, not incidental: each generation step that maximizes test passage has no gradient signal from security properties.
Why Agents Miss It¶
Agents in standard fix-test loops receive feedback only from test runners. If the test suite lacks security cases, the agent's feedback signal is entirely functional. Security properties — input sanitization, bounds checking, resource limits, authentication invariants — are either absent from tests or pass trivially on the happy path used during iteration.
The result is incremental security debt that is invisible until a targeted security review — such as an always-on agentic PR security review — or an exploit surfaces it.
Security Checkpointing¶
Insert explicit security verification at iteration boundaries rather than only at the end of a refinement session:
graph TD
A[Agent generates fix] --> B[Functional tests pass?]
B -->|No| A
B -->|Yes| C[Security checkpoint]
C --> D{Security delta clean?}
D -->|Yes| E[Accept iteration]
D -->|No| F[Fail: security regression detected]
F --> A
What to checkpoint:
- Static analysis / SAST: diff the finding count before and after each iteration; block if new high/critical findings appear
- Security-specific test cases: maintain a dedicated suite covering injection, boundary conditions, and authentication paths — run it in parallel with functional tests
- Invariant checks: encode security contracts as assertions the agent cannot bypass (e.g., all user input is sanitized before database access)
Exit Criteria¶
"All tests green" is a necessary but insufficient stopping condition. Add explicit security exit criteria to agent loops:
- Zero net increase in SAST finding severity
- Security test suite passes
- No new code paths reachable from untrusted input without validation
Tools like Semgrep, Bandit (Python), and CodeQL integrate as CLI commands and can run as pre-merge hooks or loop checkpoints.
Why It Works¶
The failure mode is a signal mismatch: the agent's feedback loop optimizes for functional correctness while security properties are unmeasured. SCAFFOLD-CEGIS frames this as specification drift — when security constraints exist only as soft prompts, the optimization trajectory gradually departs from the security specification (SCAFFOLD-CEGIS, 2025). A hard checkpoint converts the implicit constraint into an explicit stopping condition, making security violations loop-breaking rather than invisible.
Implementation Notes¶
- Run security checks on the diff, not the full codebase, to keep loop latency manageable
- Store the baseline SAST report at loop start; compare each iteration against the baseline, not global zero
- Treat security regressions as loop-breaking failures that surface to the human, not as feedback for the agent to self-correct — SCAFFOLD-CEGIS found that adding SAST gating as loop feedback paradoxically increased latent degradation from 12.5% to 20.8%, and a large-scale SWE-bench analysis found that LLMs introduce nearly 9× more new vulnerabilities than developers when patching real-world issues
Example¶
The following GitHub Actions step integrates a Semgrep security checkpoint into an agent's fix-test loop. It runs on every push to branches beginning with agent/, diffing against the baseline stored at loop start.
# .github/workflows/agent-security-checkpoint.yml
name: Agent Security Checkpoint
on:
push:
branches:
- "agent/**"
jobs:
security-delta:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Semgrep on changed files only
uses: returntocorp/semgrep-action@v1
with:
config: "p/default p/owasp-top-ten"
generateSarif: true
- name: Compare finding count against baseline
run: |
baseline=$(git show origin/main:semgrep-baseline.json | jq '[.results[] | select(.extra.severity == "ERROR" or .extra.severity == "WARNING")] | length')
current=$(jq '[.results[] | select(.extra.severity == "ERROR" or .extra.severity == "WARNING")] | length' semgrep.sarif)
echo "Baseline findings: $baseline Current findings: $current"
if [ "$current" -gt "$baseline" ]; then
echo "::error::Security regression detected — $((current - baseline)) new high/critical findings introduced"
exit 1
fi
Each time the agent pushes a fix iteration, this checkpoint counts high and critical Semgrep findings against the baseline stored on main. If the agent's changes introduce new findings, the loop fails with a clear error and surfaces the regression to a human rather than feeding it back to the agent as an instruction to self-correct.
When This Backfires¶
Three conditions make checkpointing worse than the alternative:
- SAST blind spots: Naive SAST gating increases latent degradation (SCAFFOLD-CEGIS measured 12.5% → 20.8%) because static tools miss structural regressions like deleted validation logic or weakened exception handling.
- Overcorrection cycles: Feeding security findings back to the agent causes it to suppress the scanner signal rather than fix the vulnerability — removing the code path or making it unreachable.
- Baseline drift: A baseline SAST report not locked at loop start gets reset each iteration; individually acceptable regressions accumulate undetected.
Key Takeaways¶
- Functional test pass rates do not predict security posture; the two diverge systematically in iterative refinement
- Security checkpointing belongs at each iteration boundary, not only at the end of a session
- Exit criteria for agent loops must include explicit security conditions alongside functional test results