Skip to content

Shadow Tech Debt

Shadow tech debt is the silent architectural drift agents leave when they change what a codebase does without knowing why it is shaped that way.

JetBrains coined the term Shadow Tech Debt (The New Stack) — debt that is invisible, diffuse, and compounded when agents run without structural codebase understanding.

What It Looks Like

An agent fixes a bug and the PR passes tests — but the agent skipped ADRs, ignored naming conventions, and replicated a suboptimal pattern. One such PR is invisible. Ten per day compounds into structural incoherence.

graph TD
    A[Agent runs without architectural context] --> B[Produces functionally correct output]
    B --> C[PR passes tests and review]
    C --> D[Merged]
    D --> E[Architectural drift accumulates]
    E --> F[Each new agent run amplifies existing patterns — optimal or not]
    F --> G[Coherence degrades]
    G --> H[Later changes become risky and expensive]

Why It Compounds

Agents amplify existing patterns. Suboptimal approaches propagate when agents replicate whatever is in the repository (Lavaee).

Review burden migrates, not disappears. High-AI-adoption teams merged 98% more PRs, but review time grew 91% and PR size grew 154% (Faros AI; Osmani).

Context window blindness is structural. ADRs, tribal knowledge, and style rationale live outside the context window by default.

The Risk Escalates in CI/CD

Without review gates, Shadow Tech Debt accumulates at machine speed — JetBrains Air concluded that complex codebases aren't yet ready for pure agentic coding (JetBrains Air blog).

When This Backfires

Mitigation overhead may exceed benefit when:

  • Greenfield or throwaway codebases — no accumulated architectural rationale to violate.
  • Comprehensive automated enforcement — linting and module-boundary tests catch deviations before merge.
  • Infrequent agentic use — occasional tasks under close review don't accumulate drift.

Mitigation Stack

Step Effort Action
1 Low Machine-readable context filesAGENTS.md at the repo root; CLAUDE.md for Claude Code. Scoped files (docs/CLAUDE.md) for monorepos.
2 Medium Deterministic enforcement — linters and structural tests for module boundaries, naming, and duplication ("rigor relocation" — Fowler/Boeckeler).
3 Medium Review gates — autonomous agents must not merge without human review on shared repositories.
4 High Garbage-collection agents — background scans for architectural inconsistencies (Fowler/Boeckeler; Lavaee). Requires step 1.

Caveat on step 1. An ETH Zurich evaluation (Gloaguen et al., arXiv:2602.11988) found that LLM-generated or overly detailed AGENTS.md files reduced task success rates by ~3% and increased inference cost by >20% — agents obediently followed unnecessary instructions. The finding narrows, rather than overturns, step 1: limit instruction files to non-inferable details (custom build commands, repository-specific conventions) and omit content an agent would infer from the code itself.

What Good Looks Like

graph LR
    A[Agent receives task] --> B[Loads AGENTS.md + CLAUDE.md]
    B --> C[Runs with architectural context]
    C --> D[CI runs lint + structural tests]
    D --> E[Human review gate]
    E --> F[Merge]
    F --> G[Garbage-collection agent scans for drift]

Example

An agent is asked to fix a bug where deactivated users can still appear in search results. It writes a working fix — but queries the database directly in the handler, bypassing the repository layer the team uses for all data access.

Without architectural context — the agent takes a shortcut:

# handlers/users.py
async def handle_search(query: str, db: AsyncSession):
    # Agent-generated fix: exclude deactivated users
    result = await db.execute(
        select(User).where(User.name.ilike(f"%{query}%"), User.active == True)
    )
    return result.scalars().all()

The fix passes tests. But it duplicates filtering logic, skips the team's access-control scoping, and sets a precedent that future agent runs will replicate (Pattern Replication Risk).

With AGENTS.md rule — All DB access must go through the repository layer:

# handlers/users.py
async def handle_search(query: str, user_repo: UserRepository):
    return await user_repo.search(query, include_inactive=False)
# repositories/users.py  (existing repository — agent adds the filter here)
async def search(self, query: str, include_inactive: bool = True):
    stmt = select(User).where(User.name.ilike(f"%{query}%"))
    if not include_inactive:
        stmt = stmt.where(User.active == True)
    return (await self.session.execute(stmt)).scalars().all()

Same bug fix. No architectural drift.

Key Takeaways

  • Each agentic PR can pass tests yet quietly violate ADRs, naming conventions, and the architectural rationale that lives outside the context window.
  • The debt is invisible per-PR and compounds at machine speed — agents replicate whatever patterns already exist in the repo, optimal or not.
  • Machine-readable context files (AGENTS.md, CLAUDE.md) are the cheapest mitigation, but keep them to non-inferable details — bloated instruction files cut task success and raise cost.
  • Deterministic enforcement, human review gates, and periodic drift scans are what stop the accumulation; they do not move with the agent's context window.
Feedback