Security Drift in Iterative LLM Code Refinement¶

Each iteration of an LLM-driven fix-test loop can silently accumulate security regressions even as functional tests keep passing.

The Divergence Problem¶

Iterative refinement loops — where an agent fixes a bug, runs tests, and repeats — optimize for functional correctness. Security correctness is a separate dimension that functional tests do not measure. Over multiple iterations, the two can diverge: working code accumulates attack surfaces that no test ever exercises.

SCAFFOLD-CEGIS demonstrates this empirically. LLM-driven iterative refinement passes functional benchmarks while introducing latent security regressions. The pattern is systematic, not incidental: each generation step that maximizes test passage has no gradient signal from security properties.

Why Agents Miss It¶

Agents in standard fix-test loops receive feedback only from test runners. If the test suite lacks security cases, the agent's feedback signal is entirely functional. Security properties — input sanitization, bounds checking, resource limits, authentication invariants — are either absent from tests or pass trivially on the happy path used during iteration.

The result is incremental security debt that is invisible until a targeted security review — such as an always-on agentic PR security review — or an exploit surfaces it.

Security Checkpointing¶

Insert explicit security verification at iteration boundaries rather than only at the end of a refinement session:

graph TD
    A[Agent generates fix] --> B[Functional tests pass?]
    B -->|No| A
    B -->|Yes| C[Security checkpoint]
    C --> D{Security delta clean?}
    D -->|Yes| E[Accept iteration]
    D -->|No| F[Fail: security regression detected]
    F --> A

What to checkpoint:

Static analysis / SAST: diff the finding count before and after each iteration; block if new high/critical findings appear
Security-specific test cases: maintain a dedicated suite covering injection, boundary conditions, and authentication paths — run it in parallel with functional tests
Invariant checks: encode security contracts as assertions the agent cannot bypass (e.g., all user input is sanitized before database access)

Exit Criteria¶

"All tests green" is a necessary but insufficient stopping condition. Add explicit security exit criteria to agent loops:

Zero net increase in SAST finding severity
Security test suite passes
No new code paths reachable from untrusted input without validation

Tools like Semgrep, Bandit (Python), and CodeQL integrate as CLI commands and can run as pre-merge hooks or loop checkpoints.

Why It Works¶

The failure mode is a signal mismatch: the agent's feedback loop optimizes for functional correctness while security properties are unmeasured. SCAFFOLD-CEGIS frames this as specification drift — when security constraints exist only as soft prompts, the optimization trajectory gradually departs from the security specification (SCAFFOLD-CEGIS, 2025). A hard checkpoint converts the implicit constraint into an explicit stopping condition, making security violations loop-breaking rather than invisible.

Implementation Notes¶

Run security checks on the diff, not the full codebase, to keep loop latency manageable
Store the baseline SAST report at loop start; compare each iteration against the baseline, not global zero
Treat security regressions as loop-breaking failures that surface to the human, not as feedback for the agent to self-correct — SCAFFOLD-CEGIS found that adding SAST gating as loop feedback paradoxically increased latent degradation from 12.5% to 20.8%, and a large-scale SWE-bench analysis found that LLMs introduce nearly 9× more new vulnerabilities than developers when patching real-world issues

Example¶

The following GitHub Actions step integrates a Semgrep security checkpoint into an agent's fix-test loop. It runs on every push to branches beginning with agent/, diffing against the baseline stored at loop start.

# .github/workflows/agent-security-checkpoint.yml
name: Agent Security Checkpoint

on:
  push:
    branches:
      - "agent/**"

jobs:
  security-delta:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run Semgrep on changed files only
        uses: returntocorp/semgrep-action@v1
        with:
          config: "p/default p/owasp-top-ten"
          generateSarif: true

      - name: Compare finding count against baseline
        run: |
          baseline=$(git show origin/main:semgrep-baseline.json | jq '[.results[] | select(.extra.severity == "ERROR" or .extra.severity == "WARNING")] | length')
          current=$(jq '[.results[] | select(.extra.severity == "ERROR" or .extra.severity == "WARNING")] | length' semgrep.sarif)
          echo "Baseline findings: $baseline  Current findings: $current"
          if [ "$current" -gt "$baseline" ]; then
            echo "::error::Security regression detected — $((current - baseline)) new high/critical findings introduced"
            exit 1
          fi

Each time the agent pushes a fix iteration, this checkpoint counts high and critical Semgrep findings against the baseline stored on main. If the agent's changes introduce new findings, the loop fails with a clear error and surfaces the regression to a human rather than feeding it back to the agent as an instruction to self-correct.

When This Backfires¶

Three conditions make checkpointing worse than the alternative:

SAST blind spots: Naive SAST gating increases latent degradation (SCAFFOLD-CEGIS measured 12.5% → 20.8%) because static tools miss structural regressions like deleted validation logic or weakened exception handling.
Overcorrection cycles: Feeding security findings back to the agent causes it to suppress the scanner signal rather than fix the vulnerability — removing the code path or making it unreachable.
Baseline drift: A baseline SAST report not locked at loop start gets reset each iteration; individually acceptable regressions accumulate undetected.

Key Takeaways¶

Functional test pass rates do not predict security posture; the two diverge systematically in iterative refinement
Security checkpointing belongs at each iteration boundary, not only at the end of a session
Exit criteria for agent loops must include explicit security conditions alongside functional test results