Skip to content

Preempting Agentic PR Rejection by Failure-Mode Category

A 14-reason rejection taxonomy explains why 46% of agentic fix PRs fail, and only implementation and CI categories respond to preemption prompts.

The Rejection Taxonomy

Across 3,225 fix pull requests from Copilot, Devin, Cursor, and Claude in the AIDev dataset, 46.41% were rejected; a qualitative two-rater study of a representative 306-PR sample (95% CI, Cohen's κ = 0.605) organizes the reasons into four categories and 14 specific causes (arXiv:2606.13468).

Category Share of sample Specific reasons (share of sample)
Relevance of Fix 24.2% Inactivity 17.3%, Superseded 5.9%, Low priority 1.0%, Architecture change 0.3%, Test PR 0.3%
Implementation Issues 10.1% Incorrect fix 5.6%, Wrong approach 2.6%, Ambiguity 0.7%, Insufficient 0.7%, Wrong repo 0.7%
Provider-Related 8.5% Agent failure 7.5%, Rate limit 1.0%
Technical Issues 7.2% CI failure 6.9%, Breaking change 0.3%
Unclassified 49.3% No explicit reviewer rationale in the PR thread

The unclassified bucket is large because rejected agent-authored PRs frequently lack reviewer feedback — a companion study of 654 rejected PRs across five agents finds 67.9% of rejections carry no explicit reviewer comment (arXiv:2602.04226).

What the Categories Mean for Preemption

The four categories have different causal roots, so preemption prompts only move some of them. The paper recommends three concrete practices, each targeting a specific bucket (arXiv:2606.13468):

  1. Approach hints and do-not constraints in the agent instruction file (e.g., .github/copilot-instructions.md, AGENTS.md) — encode the team's implicit conventions reviewers would otherwise enforce. Targets Implementation Issues.
  2. CI validation instructions — tell the agent how to run tests and confirm the fix without introducing breaking changes. Targets Technical Issues.
  3. Task prioritization before dispatch — filter out low-priority, superseded, or stale-on-arrival issues. Targets Low-priority and Superseded sub-reasons under Relevance of Fix.

Inactivity (17.3% of the sample, the single largest cause) is a workflow-attention failure, not a fix-content failure — no prompt change reduces it. Provider-Related rejections (agent failure 7.5%, rate limit 1.0%) are infrastructure failures and prompt-immune.

Why It Works

Reviewers reject implementation-bucket fixes because the agent ignored unwritten team conventions — style rules, architectural choices, "we don't use library X here," test expectations — that the agent could not infer from issue text alone. Encoding those conventions in the instruction file gives the agent the same implicit knowledge a new human contributor would learn from a senior engineer's pre-PR review. The paper explicitly identifies this mechanism in its Implications section: developers should "provide guidance on how to perform the fix or provide guidance on what approaches are not acceptable in the agent instruction file" (arXiv:2606.13468). The mechanism aligns with the Implicit Knowledge Problem anti-pattern — agents fail when the team's conventions are nowhere in the artifacts the agent reads.

When This Backfires

Preemption prompts target only the Implementation and Technical-Issue buckets — roughly 17 percentage points of the rejection rate. The remaining ~30 points either resist prompt intervention or require workflow-side changes:

  • Greenfield or single-purpose repos without an established convention set — the instruction file has nothing to encode beyond generic advice, and the overhead of authoring it exceeds the rejection cost.
  • Silent-reject reviewers: 67.9% of rejected PRs carry no reviewer feedback (arXiv:2602.04226) — instructions cannot address rejection reasons the reviewer never states.
  • Inactivity rejections (17.3% of the sample): driven by reviewer attention and triage cadence, not by PR content; preemption shifts only the workflow side.
  • Provider-side rejections (agent failure 7.5%, rate limit 1.0%): no prompt can prevent the agent from going down or running out of quota.
  • Low-priority and Superseded fixes: a task-routing problem, not a fix-quality problem. The agent producing a better fix does not change the outcome — the issue should not have been dispatched to an agent at all. See Agent PR Volume vs. Value for the productivity-paradox framing.
  • Different sampling, different headline: a separate empirical study of fix-related PRs measures a 65% merge rate (Codex 81.6%, Copilot 42.4%, Devin 42.9%) on a different sample (arXiv:2602.00164). The 46.41% rejection figure is sample-specific to AIDev's fix-PR slice; treat the headline as a calibration target, not a universal constant.

The paper measures rejection causes, not the causal effect of any preemption intervention. There is no empirical measurement that adding .github/copilot-instructions.md reduces rejection rate by a quantified amount — the practice is well-motivated by the taxonomy but not yet validated by an A/B comparison.

Example

The paper's three preemption practices translate to a concrete artifact layout. GitHub Copilot's repository custom instructions file (.github/copilot-instructions.md) is the documented surface for the first practice, and the paper recommends it by name (arXiv:2606.13468).

A preemption-shaped instruction file carries three load-bearing sections:

## Approach hints

- Prefer minimal-diff fixes; do not refactor adjacent code in the same PR.
- Address the underlying cause, not the symptom.

## Approaches to avoid

- Do not add new dependencies without an issue thread approving them.
- Do not modify CI configuration to make tests pass.

## Validation before opening a PR

- Run `<project test command>` and confirm the previously failing test now passes.
- Run `<project lint command>` and confirm no new warnings.

The Approach-hints and Approaches-to-avoid blocks target Implementation Issues; the Validation block targets Technical Issues. Nothing in the file addresses Inactivity, Superseded, or Provider-Related failures — those need workflow-side or infrastructure changes.

Key Takeaways

  • 46.41% of agent-authored fix PRs in the AIDev sample are rejected; the reasons cluster into 14 specific causes across four categories.
  • Implementation Issues (10.1%) and Technical Issues (7.2%) are the buckets that respond to preemption prompts — roughly 17 percentage points of the rejection rate.
  • Inactivity (17.3%) is the single largest sub-reason and is a workflow-attention failure, not a content failure.
  • 67.9% of rejected PRs lack explicit reviewer feedback, so the prescription set is grounded on a minority of cases.
  • Preemption practices — approach hints, CI validation instructions, pre-dispatch prioritization — target specific buckets; treat them as partial mitigations, not universal merge-rate boosters.

Sources

  • arXiv:2606.13468 — Abujadallah, Arabat, Sayagh (2026): "Understanding the Rejection of Fixes Generated by Agentic Pull Requests — Insights from the AIDev Dataset" (MSR '26)
  • arXiv:2602.04226 — companion study of 654 rejected PRs across five agents: 67.9% lack reviewer feedback; seven rejection modes occur only in agent-authored PRs
  • arXiv:2507.15003 — AIDev dataset paper, the upstream source for both studies
  • arXiv:2602.00164 — companion empirical study with a 65% merge rate on a different sample, illustrating the sample-dependence of the headline figure
Feedback