Shadow Tech Debt¶

Shadow tech debt is the silent architectural drift agents leave when they change what a codebase does without knowing why it is shaped that way.

JetBrains coined the term Shadow Tech Debt (The New Stack) — debt that is invisible, diffuse, and compounded when agents run without structural codebase understanding.

What It Looks Like¶

An agent fixes a bug and the PR passes tests — but the agent skipped ADRs, ignored naming conventions, and replicated a suboptimal pattern. One such PR is invisible. Ten per day compounds into structural incoherence.

graph TD
    A[Agent runs without architectural context] --> B[Produces functionally correct output]
    B --> C[PR passes tests and review]
    C --> D[Merged]
    D --> E[Architectural drift accumulates]
    E --> F[Each new agent run amplifies existing patterns — optimal or not]
    F --> G[Coherence degrades]
    G --> H[Later changes become risky and expensive]

Why It Compounds¶

Agents amplify existing patterns. Suboptimal approaches propagate when agents replicate whatever is in the repository (Lavaee).

Review burden migrates, not disappears. High-AI-adoption teams merged 98% more PRs, but review time grew 91% and PR size grew 154% (Faros AI; Osmani).

Context window blindness is structural. ADRs, tribal knowledge, and style rationale live outside the context window by default.

The Risk Escalates in CI/CD¶

Without review gates, Shadow Tech Debt accumulates at machine speed — JetBrains Air concluded that complex codebases aren't yet ready for pure agentic coding (JetBrains Air blog).

When This Backfires¶

Mitigation overhead may exceed benefit when:

Greenfield or throwaway codebases — no accumulated architectural rationale to violate.
Comprehensive automated enforcement — linting and module-boundary tests catch deviations before merge.
Infrequent agentic use — occasional tasks under close review don't accumulate drift.

Mitigation Stack¶

Step	Effort	Action
1	Low	Machine-readable context files — AGENTS.md at the repo root; CLAUDE.md for Claude Code. Scoped files (`docs/CLAUDE.md`) for monorepos.
2	Medium	Deterministic enforcement — linters and structural tests for module boundaries, naming, and duplication ("rigor relocation" — Fowler/Boeckeler).
3	Medium	Review gates — autonomous agents must not merge without human review on shared repositories.
4	High	Garbage-collection agents — background scans for architectural inconsistencies (Fowler/Boeckeler; Lavaee). Requires step 1.

Caveat on step 1. An ETH Zurich evaluation (Gloaguen et al., arXiv:2602.11988) found that LLM-generated or overly detailed AGENTS.md files reduced task success rates by ~3% and increased inference cost by >20% — agents obediently followed unnecessary instructions. The finding narrows, rather than overturns, step 1: limit instruction files to non-inferable details (custom build commands, repository-specific conventions) and omit content an agent would infer from the code itself.

What Good Looks Like¶

graph LR
    A[Agent receives task] --> B[Loads AGENTS.md + CLAUDE.md]
    B --> C[Runs with architectural context]
    C --> D[CI runs lint + structural tests]
    D --> E[Human review gate]
    E --> F[Merge]
    F --> G[Garbage-collection agent scans for drift]

Example¶

An agent is asked to fix a bug where deactivated users can still appear in search results. It writes a working fix — but queries the database directly in the handler, bypassing the repository layer the team uses for all data access.

Without architectural context — the agent takes a shortcut:

# handlers/users.py
async def handle_search(query: str, db: AsyncSession):
    # Agent-generated fix: exclude deactivated users
    result = await db.execute(
        select(User).where(User.name.ilike(f"%{query}%"), User.active == True)
    )
    return result.scalars().all()

The fix passes tests. But it duplicates filtering logic, skips the team's access-control scoping, and sets a precedent that future agent runs will replicate (Pattern Replication Risk).

With AGENTS.md rule — All DB access must go through the repository layer:

# handlers/users.py
async def handle_search(query: str, user_repo: UserRepository):
    return await user_repo.search(query, include_inactive=False)

# repositories/users.py  (existing repository — agent adds the filter here)
async def search(self, query: str, include_inactive: bool = True):
    stmt = select(User).where(User.name.ilike(f"%{query}%"))
    if not include_inactive:
        stmt = stmt.where(User.active == True)
    return (await self.session.execute(stmt)).scalars().all()

Same bug fix. No architectural drift.

Key Takeaways¶

Each agentic PR can pass tests yet quietly violate ADRs, naming conventions, and the architectural rationale that lives outside the context window.
The debt is invisible per-PR and compounds at machine speed — agents replicate whatever patterns already exist in the repo, optimal or not.
Machine-readable context files (AGENTS.md, CLAUDE.md) are the cheapest mitigation, but keep them to non-inferable details — bloated instruction files cut task success and raise cost.
Deterministic enforcement, human review gates, and periodic drift scans are what stop the accumulation; they do not move with the agent's context window.