Shadow Tech Debt¶
Shadow tech debt is the silent architectural drift agents leave when they change what a codebase does without knowing why it is shaped that way.
JetBrains coined the term Shadow Tech Debt (The New Stack) — debt that is invisible, diffuse, and compounded when agents run without structural codebase understanding.
What It Looks Like¶
An agent fixes a bug and the PR passes tests — but the agent skipped ADRs, ignored naming conventions, and replicated a suboptimal pattern. One such PR is invisible. Ten per day compounds into structural incoherence.
graph TD
A[Agent runs without architectural context] --> B[Produces functionally correct output]
B --> C[PR passes tests and review]
C --> D[Merged]
D --> E[Architectural drift accumulates]
E --> F[Each new agent run amplifies existing patterns — optimal or not]
F --> G[Coherence degrades]
G --> H[Later changes become risky and expensive]
Why It Compounds¶
Agents amplify existing patterns. Suboptimal approaches propagate when agents replicate whatever is in the repository (Lavaee).
Review burden migrates, not disappears. High-AI-adoption teams merged 98% more PRs, but review time grew 91% and PR size grew 154% (Faros AI; Osmani).
Context window blindness is structural. ADRs, tribal knowledge, and style rationale live outside the context window by default.
The Risk Escalates in CI/CD¶
Without review gates, Shadow Tech Debt accumulates at machine speed — JetBrains Air concluded that complex codebases aren't yet ready for pure agentic coding (JetBrains Air blog).
When This Backfires¶
Mitigation overhead may exceed benefit when:
- Greenfield or throwaway codebases — no accumulated architectural rationale to violate.
- Comprehensive automated enforcement — linting and module-boundary tests catch deviations before merge.
- Infrequent agentic use — occasional tasks under close review don't accumulate drift.
Mitigation Stack¶
| Step | Effort | Action |
|---|---|---|
| 1 | Low | Machine-readable context files — AGENTS.md at the repo root; CLAUDE.md for Claude Code. Scoped files (docs/CLAUDE.md) for monorepos. |
| 2 | Medium | Deterministic enforcement — linters and structural tests for module boundaries, naming, and duplication ("rigor relocation" — Fowler/Boeckeler). |
| 3 | Medium | Review gates — autonomous agents must not merge without human review on shared repositories. |
| 4 | High | Garbage-collection agents — background scans for architectural inconsistencies (Fowler/Boeckeler; Lavaee). Requires step 1. |
Caveat on step 1. An ETH Zurich evaluation (Gloaguen et al., arXiv:2602.11988) found that LLM-generated or overly detailed AGENTS.md files reduced task success rates by ~3% and increased inference cost by >20% — agents obediently followed unnecessary instructions. The finding narrows, rather than overturns, step 1: limit instruction files to non-inferable details (custom build commands, repository-specific conventions) and omit content an agent would infer from the code itself.
What Good Looks Like¶
graph LR
A[Agent receives task] --> B[Loads AGENTS.md + CLAUDE.md]
B --> C[Runs with architectural context]
C --> D[CI runs lint + structural tests]
D --> E[Human review gate]
E --> F[Merge]
F --> G[Garbage-collection agent scans for drift]
Example¶
An agent is asked to fix a bug where deactivated users can still appear in search results. It writes a working fix — but queries the database directly in the handler, bypassing the repository layer the team uses for all data access.
Without architectural context — the agent takes a shortcut:
# handlers/users.py
async def handle_search(query: str, db: AsyncSession):
# Agent-generated fix: exclude deactivated users
result = await db.execute(
select(User).where(User.name.ilike(f"%{query}%"), User.active == True)
)
return result.scalars().all()
The fix passes tests. But it duplicates filtering logic, skips the team's access-control scoping, and sets a precedent that future agent runs will replicate (Pattern Replication Risk).
With AGENTS.md rule — All DB access must go through the repository layer:
# handlers/users.py
async def handle_search(query: str, user_repo: UserRepository):
return await user_repo.search(query, include_inactive=False)
# repositories/users.py (existing repository — agent adds the filter here)
async def search(self, query: str, include_inactive: bool = True):
stmt = select(User).where(User.name.ilike(f"%{query}%"))
if not include_inactive:
stmt = stmt.where(User.active == True)
return (await self.session.execute(stmt)).scalars().all()
Same bug fix. No architectural drift.
Key Takeaways¶
- Each agentic PR can pass tests yet quietly violate ADRs, naming conventions, and the architectural rationale that lives outside the context window.
- The debt is invisible per-PR and compounds at machine speed — agents replicate whatever patterns already exist in the repo, optimal or not.
- Machine-readable context files (AGENTS.md, CLAUDE.md) are the cheapest mitigation, but keep them to non-inferable details — bloated instruction files cut task success and raise cost.
- Deterministic enforcement, human review gates, and periodic drift scans are what stop the accumulation; they do not move with the agent's context window.