Dependency Gap Validation for AI-Generated Code¶
AI coding agents declare a fraction of the dependencies their code actually needs at runtime — validate in clean environments before trusting the manifest.
The problem¶
When an LLM generates a project, it writes a dependency file — requirements.txt, package.json, pom.xml. That file is incomplete. An empirical study of 300 projects across Claude Code, OpenAI Codex, and Gemini found a 13.5x average expansion from declared dependencies to actual runtime dependencies. A Python project declaring 3 packages (scikit-learn, pandas, matplotlib) needed 52 at runtime.
This is the dependency gap: the difference between what the agent says the code needs and what the code actually needs to run.
How large the gap gets¶
The study tested each agent across Python, JavaScript, and Java in clean AWS environments with only OS-level packages installed:
| Language | Success rate | Runtime multiplier | Why |
|---|---|---|---|
| Python | 89.2% | 12.3x | Flat requirements.txt is easiest for LLMs to enumerate |
| JavaScript | 61.9% | 9.7x | npm auto-resolves transitive deps into package-lock.json, masking gaps in the manifest |
| Java | 44.0% | 18.4x | Maven transitive resolution produces deep dependency graphs that LLMs rarely declare completely |
Agent performance also varies by language:
| Agent | Python | JavaScript | Java |
|---|---|---|---|
| Claude Code | 80% | 60% | 80% |
| Gemini | 100% | 71% | 28% |
| Codex | 88% | 54% | 24% |
The practical takeaway: match the agent to the language. Gemini does well at Python but struggles with Java. Claude shows the most balanced performance across languages.
Three-layer dependency framework¶
The study offers a simple model for thinking about AI-generated dependency declarations:
graph LR
Dc["Claimed (Dc)<br/>In config file"] --> Dw["Working (Dw)<br/>Dc + debugged additions"]
Dw --> Dr["Runtime (Dr)<br/>Full transitive closure"]
style Dc fill:#fee,stroke:#c33
style Dw fill:#ffd,stroke:#cc6
style Dr fill:#dfd,stroke:#393
- Claimed (Dc) — packages the agent listed in the dependency file
- Working (Dw) — claimed packages plus whatever you add by hand to get the code running
- Runtime (Dr) — the full transitive closure: every package loaded while the code runs
The gap between Dc and Dr is what breaks your builds. Lock files (package-lock.json, poetry.lock, Pipfile.lock) capture Dr, but only if you generate them from a working environment — not from the agent's incomplete Dc.
Where failures actually come from¶
The dependency gap is real but not the primary failure mode. Among 95 failed projects:
| Failure type | Share | Example |
|---|---|---|
| Code bugs (syntax, logic, malformed imports) | 52.6% | Uninitialized variables, wrong API signatures |
| Unparseable output | 16.8% | Agent output too malformed to execute |
| Version or structural conflicts | 15.8% | Incompatible package versions |
| Missing dependencies | 10.5% | ImportError, ModuleNotFoundError |
| Environment issues | 4.2% | System-level conflicts |
Code quality failures outnumber dependency gaps 5 to 1. Both matter, but fixing only the manifest will not save a project with broken imports.
Validation workflow¶
1. Generate lock files in a clean environment¶
Never trust the agent's declared dependencies as the final list. Generate the lock file in a clean environment to capture the full transitive closure:
# Python
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip freeze > requirements-lock.txt
# JavaScript
rm -rf node_modules package-lock.json
npm install
# Java
mvn dependency:resolve
mvn dependency:tree > dep-tree.txt
2. Test in isolation¶
Run the generated project in a container or fresh virtual environment with nothing pre-installed:
# Docker-based validation
docker run --rm -v $(pwd):/app -w /app python:3.12-slim \
sh -c "pip install -r requirements.txt && python main.py"
If it fails, you've found a dependency gap. Add the missing packages and regenerate the lock file.
3. Add to CI¶
Make clean-environment validation a gate, not a manual step. A deterministic guardrail catches what prompts cannot enforce:
# GitHub Actions example
- name: Validate dependencies
run: |
pip install -r requirements.txt
python -c "import main" # Fails fast on missing imports
When this backfires¶
Clean-environment validation is not free. It is worse than the alternatives under these conditions:
- Prototype or throwaway code — if the project will never leave a developer's laptop, the 30-to-60-second clean-env rebuild per iteration outweighs a 10% chance of
ModuleNotFoundError - Tight inner loop — when the agent iterates in a harness that already imports and runs the code each turn, import errors surface at once; a separate CI gate adds latency without new signal
- Managed runtimes with implicit deps — platforms such as AWS Lambda layers, Databricks notebooks, or Jupyter kernels pre-install common packages, so a strict "only OS packages" baseline flags false positives that surely exist in the target environment
- Monorepos with shared lockfiles — if a root lockfile already pins the transitive closure for all sub-projects, re-resolving per agent-authored change is wasted work
Weigh the gate cost against failure cost. For production deployments and external releases, clean-env validation pays for itself; for experiments and spike branches, it often does not.
Key Takeaways¶
- AI agents declare ~7% of the packages their code actually loads at runtime — the other 93% are transitive dependencies they never mention
- Python is the safest language for AI-generated projects (89% success); Java is the riskiest (44%)
- Most AI code failures are bugs, not dependency gaps — validate both
- Lock files only work if generated from a working environment, not from the agent's incomplete manifest
- Clean-environment testing is a deterministic guardrail — add it to CI, don't rely on agents to self-check
Related¶
- Deterministic Guardrails Around Probabilistic Agents — the principle behind making dependency validation a CI gate rather than an agent instruction
- Agent Environment Bootstrapping — broader workflow for setting up reproducible agent execution environments
- Verification-Centric Development — development workflow that prioritizes verification at every stage