Improper Output Handling: Validate Agent Output Before Downstream Use¶
Treat agent output as untrusted input to the next system — every downstream sink needs its own per-sink validation gate.
Improper Output Handling is the OWASP LLM05:2025 vulnerability class: "insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems." For coding agents, the canonical case is generated code executed or committed without review, plus agent text rendered in surfaces that auto-interpret it.
The Mechanism¶
Trust does not transfer through string boundaries. An LLM produces strings that bypass any input-validation layer already in place — that layer validated the user prompt, not the model response. When the response reaches a code-interpreting sink, the system meets that sink with text nothing in the pipeline scrutinised. OWASP's rule: "Treat the model as any other user, adopting a zero-trust approach, and apply proper input validation on responses coming from the model to backend functions" (OWASP LLM05:2025).
OWASP lists five canonical sinks: exec/eval/shell (RCE), browser-rendered JS or markdown (XSS), unparameterised SQL (injection), unsanitised file paths (traversal), and email templates without escaping (phishing) (OWASP LLM05:2025).
Downstream Sinks in Coding-Agent Workflows¶
| Sink | Risk | Per-sink control |
|---|---|---|
| Source control (auto-commit) | Vulnerable code lands in main | SAST gate in CI; human approval on security-sensitive paths (Infosecurity Magazine) |
Shell / exec / eval in tool execution |
Remote code execution | Allowlist of permitted commands; refuse eval on any LLM-produced string (OWASP LLM05:2025) |
| SQL through ORM or string-concat | SQL injection (e.g. CVE-2025-1793) | Parameterised queries or prepared statements — never string interpolation (OWASP LLM05:2025) |
| Markdown / HTML in PR comments, chat, email | XSS, exfiltration via auto-fetched images | Context-aware HTML encoding; CSP; image-fetch gating in the renderer |
Package manager (pip install, npm install) |
Slopsquatting — install of hallucinated package | Resolve generated dependency lists against an installed lockfile before install (Help Net Security) |
| File paths | Path traversal | Canonicalise and constrain to a base directory before any FS call (OWASP LLM05:2025) |
| Downstream LLM (multi-agent pipeline) | Re-injection via tool output | Route via Action-Selector Pattern so tool output never re-enters a model |
The canonical SQL instantiation is CVE-2025-1793 in LlamaIndex (CVSS 9.8): vector-store integrations constructed SQL by string-concatenating LLM-built queries, fixed in 0.12.28 by switching to parameterised queries (Endor Labs advisory; GHSA-v3c8-3pr6-gr7p). The fix is not LLM-specific; it is the same parameterisation control that defends against any user-supplied SQL string.
Why It Works¶
LLM output and user input belong to the same trust class — both are arbitrary strings that cross trust boundaries at a code-interpreting sink. Per-sink controls (parameterisation, HTML encoding, allowlists) defend the boundary, not the source, so they apply to LLM output for the same reason they apply to user input. The novelty of LLM05 is not new controls; it is new applicability surface — LLM responses now cross boundaries that previously consumed only validated or trusted strings. Simon Willison frames it: "anyone who can control more than a few sentences of input to that model can cause it to output anything they like" (Willison, 2025).
The defect baseline confirms the need: approximately 24.7% of AI-generated code carries a security vulnerability — 1.5 to 2× the rate of human-written code (Infosecurity Magazine; Checkmarx). Without a downstream gate, that defect rate ships.
Relation to Adjacent Patterns¶
LLM05 is the risk class; these patterns are concrete mitigations against specific sinks:
- Safe Outputs Pattern — bounds the write blast radius by routing every modification through a validating MCP; operation filter, content moderation, and secret removal each realise per-sink validation.
- Action-Selector Pattern — eliminates the tool-output-to-LLM feedback loop, so the downstream sink is a deterministic executor instead of another model.
- Agent-Authored Messages as a Deferred Exfiltration Channel — a concrete LLM05 instantiation where the unsafe sink is renderer auto-fetch.
Distinction From LLM06¶
OWASP draws the boundary explicitly: "Excessive Agency differs from Insecure Output Handling which is concerned with insufficient scrutiny of LLM outputs" (OWASP LLM06:2025). LLM06 is the agent taking action (excessive functionality, permissions, autonomy); LLM05 is a downstream consumer mishandling the text the agent produced. A permission-bounded agent can still trigger LLM05 if its bounded actions emit strings consumed unsafely downstream. The controls are orthogonal — bound actions (LLM06) and validate output at each sink (LLM05).
When This Backfires¶
- Mature teams already enforce per-sink encoding — A team that already parameterises every DB call, escapes every HTML render, and refuses
evalon arbitrary strings gains nothing from a parallel LLM-specific scanner. Confirm existing per-sink controls cover the new applicability surface; do not duplicate them. - The LLM never reaches a code-interpreting sink — An agent that drafts emails for human review or generates code a developer reads before commit has no LLM05 surface in those paths. Adding output validation there is theatre with a latency cost.
- Output-stream filtering blocks legitimate technical content — Pattern-based "strip URLs / strip code blocks" sanitisers reject legitimate documentation links and code samples. Prefer context-aware escaping at the sink over content-stripping of the LLM stream (Hoyt: Don't sanitize input. Escape output).
- Schema-constrained tool calling already enforces validation — Strict structured outputs (Pydantic / JSON Schema) at the model interface mean the LLM emits only schema-conformant parameters, never arbitrary strings. Layering per-sink validation on top adds latency without marginal risk reduction; the schema is the validator.
Example¶
A retrieval-augmented agent answers user questions over a vector store. The agent constructs a metadata filter from natural language; the application executes it against PostgreSQL.
Before — generated query interpolated into SQL:
filter_clause = llm.generate(f"build a WHERE clause for: {user_question}")
cursor.execute(f"SELECT * FROM docs WHERE {filter_clause}")
A user prompt like delete all rephrased by the LLM into 1=1; DROP TABLE docs -- executes against the database. This is the CVE-2025-1793 class.
After — LLM emits a schema-constrained filter object; the executor builds parameterised SQL:
from pydantic import BaseModel
from typing import Literal
class Filter(BaseModel):
field: Literal["author", "year", "tag"]
op: Literal["=", ">", "<"]
value: str
f = Filter.model_validate(llm.structured_output(user_question, schema=Filter))
cursor.execute(
"SELECT * FROM docs WHERE %s %s %s",
(f.field, f.op, f.value),
)
The LLM never writes SQL; the deterministic executor builds the parameterised query from validated fields.
Key Takeaways¶
- Agent output is untrusted input to the next system — enumerate downstream sinks and validate at each one.
- The controls are not new: parameterised queries, context-aware encoding, command allowlists, schema validation. The new surface is that LLM-generated strings now reach those sinks.
- LLM05 (downstream mishandling) is distinct from LLM06 (agent action) — defend both, independently.
- Safe Outputs and Action-Selector are concrete architectural mitigations; per-sink encoding is the universal control.
- For coding agents specifically, the sinks include source control, package managers, and shells — sinks that previously consumed only trusted strings.