Dependency Gap Validation for AI-Generated Code¶

AI coding agents declare a fraction of the dependencies their code actually needs at runtime — validate in clean environments before trusting the manifest.

The problem¶

When an LLM generates a project, it writes a dependency file — requirements.txt, package.json, pom.xml. That file is incomplete. An empirical study of 300 projects across Claude Code, OpenAI Codex, and Gemini found a 13.5x average expansion from declared dependencies to actual runtime dependencies. A Python project declaring 3 packages (scikit-learn, pandas, matplotlib) needed 52 at runtime.

This is the dependency gap: the difference between what the agent says the code needs and what the code actually needs to run.

How large the gap gets¶

The study tested each agent across Python, JavaScript, and Java in clean AWS environments with only OS-level packages installed:

Language	Success rate	Runtime multiplier	Why
Python	89.2%	12.3x	Flat `requirements.txt` is easiest for LLMs to enumerate
JavaScript	61.9%	9.7x	npm auto-resolves transitive deps into `package-lock.json`, masking gaps in the manifest
Java	44.0%	18.4x	Maven transitive resolution produces deep dependency graphs that LLMs rarely declare completely

Agent performance also varies by language:

Agent	Python	JavaScript	Java
Claude Code	80%	60%	80%
Gemini	100%	71%	28%
Codex	88%	54%	24%

The practical takeaway: match the agent to the language. Gemini does well at Python but struggles with Java. Claude shows the most balanced performance across languages.

Three-layer dependency framework¶

The study offers a simple model for thinking about AI-generated dependency declarations:

graph LR
    Dc["Claimed (Dc)<br/>In config file"] --> Dw["Working (Dw)<br/>Dc + debugged additions"]
    Dw --> Dr["Runtime (Dr)<br/>Full transitive closure"]

    style Dc fill:#fee,stroke:#c33
    style Dw fill:#ffd,stroke:#cc6
    style Dr fill:#dfd,stroke:#393

Claimed (Dc) — packages the agent listed in the dependency file
Working (Dw) — claimed packages plus whatever you add by hand to get the code running
Runtime (Dr) — the full transitive closure: every package loaded while the code runs

The gap between Dc and Dr is what breaks your builds. Lock files (package-lock.json, poetry.lock, Pipfile.lock) capture Dr, but only if you generate them from a working environment — not from the agent's incomplete Dc.

Where failures actually come from¶

The dependency gap is real but not the primary failure mode. Among 95 failed projects:

Failure type	Share	Example
Code bugs (syntax, logic, malformed imports)	52.6%	Uninitialized variables, wrong API signatures
Unparseable output	16.8%	Agent output too malformed to execute
Version or structural conflicts	15.8%	Incompatible package versions
Missing dependencies	10.5%	`ImportError`, `ModuleNotFoundError`
Environment issues	4.2%	System-level conflicts

Code quality failures outnumber dependency gaps 5 to 1. Both matter, but fixing only the manifest will not save a project with broken imports.

Validation workflow¶

1. Generate lock files in a clean environment¶

Never trust the agent's declared dependencies as the final list. Generate the lock file in a clean environment to capture the full transitive closure:

# Python
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip freeze > requirements-lock.txt

# JavaScript
rm -rf node_modules package-lock.json
npm install

# Java
mvn dependency:resolve
mvn dependency:tree > dep-tree.txt

2. Test in isolation¶

Run the generated project in a container or fresh virtual environment with nothing pre-installed:

# Docker-based validation
docker run --rm -v $(pwd):/app -w /app python:3.12-slim \
  sh -c "pip install -r requirements.txt && python main.py"

If it fails, you've found a dependency gap. Add the missing packages and regenerate the lock file.

3. Add to CI¶

Make clean-environment validation a gate, not a manual step. A deterministic guardrail catches what prompts cannot enforce:

# GitHub Actions example
- name: Validate dependencies
  run: |
    pip install -r requirements.txt
    python -c "import main"  # Fails fast on missing imports

When this backfires¶

Clean-environment validation is not free. It is worse than the alternatives under these conditions:

Prototype or throwaway code — if the project will never leave a developer's laptop, the 30-to-60-second clean-env rebuild per iteration outweighs a 10% chance of ModuleNotFoundError
Tight inner loop — when the agent iterates in a harness that already imports and runs the code each turn, import errors surface at once; a separate CI gate adds latency without new signal
Managed runtimes with implicit deps — platforms such as AWS Lambda layers, Databricks notebooks, or Jupyter kernels pre-install common packages, so a strict "only OS packages" baseline flags false positives that surely exist in the target environment
Monorepos with shared lockfiles — if a root lockfile already pins the transitive closure for all sub-projects, re-resolving per agent-authored change is wasted work

Weigh the gate cost against failure cost. For production deployments and external releases, clean-env validation pays for itself; for experiments and spike branches, it often does not.

Key Takeaways¶

AI agents declare ~7% of the packages their code actually loads at runtime — the other 93% are transitive dependencies they never mention
Python is the safest language for AI-generated projects (89% success); Java is the riskiest (44%)
Most AI code failures are bugs, not dependency gaps — validate both
Lock files only work if generated from a working environment, not from the agent's incomplete manifest
Clean-environment testing is a deterministic guardrail — add it to CI, don't rely on agents to self-check

Deterministic Guardrails Around Probabilistic Agents — the principle behind making dependency validation a CI gate rather than an agent instruction
Agent Environment Bootstrapping — broader workflow for setting up reproducible agent execution environments
Verification-Centric Development — development workflow that prioritizes verification at every stage