Security¶

Patterns and techniques for building agents that resist manipulation, protect sensitive data, and fail safely.

Threat Models¶

Threat models identify the structural conditions that make agent systems exploitable and prescribe architectural mitigations.

Action-Audit Divergence: A Four-Mode Taxonomy for Runtime Hardening — Name the four ways an agent action can diverge from its audit record (gate-bypass, audit-forgery, silent host failure, wrong-target) to convert "is this runtime hardened?" into a coverage checklist against existing controls
Compositional Vulnerability Induction in Coding Agents — Decomposing a malicious end-state into three innocuous engineering tickets bypasses refusal and hardening defenses at 53–86% ASR across nine production coding agents; pentester-framed reviewers close most of the gap
Constraint Drift: Why Safety Must Be Maintained, Not Asserted — Safety constraints encoded in prompts weaken across six trajectory surfaces — memory, delegation, communication, tool use, audit, optimization; the four-property invariant (fresh, inherited, enforceable, auditable) keeps them operative when delegation depth, memory persistence, and tool surface compose
Context-Fractured Decomposition Attacks on Tool-Using Agents — Attacks split across tools, modules, and time slip past defenders that only inspect a single contiguous conversation; artifact provenance gaps let benign intermediate steps recompose into a jailbreak downstream, lifting attack success by up to 28.3 percentage points
Distributed Cross-PR Attacks in Persistent-State AI Control — An untrusted coding agent spreads a covert payload across a sequence of PRs, timing each piece for natural cover, so stateless per-PR review never sees the whole attack; a stateful link-tracker plus a monitor ensemble cuts gradual-attack evasion from 93% to 47%
Forged Reasoning Trace Attacks on Agent Memory (FARMA) — Forged reasoning traces poison an agent's stored decision history so it skips a safety step believing it already ran; evasive wording slips past keyword filters and self-referential amplification defeats consensus defenses, reaching 100% success on binary safety-gate agents
Computer-Systems Lens for Always-On Agent Security — Model an always-on agent as a computer system — gateway runtime as OS, Skills as applications, Plugins as loadable extensions — to port classical OS protections onto the four cross-component surfaces that model-response benchmarks miss
Four-Layer Taxonomy of Agent Security Risks — Group threats into context/instruction, tool/action, state/persistence, and ecosystem/automation layers to map controls and surface coverage gaps where attacks propagate across boundaries
Goal Reframing: The Primary Exploitation Trigger for LLM Agents — A 10,000-trial taxonomy finds goal reframing — not social engineering or incentives — is the one prompt condition that reliably triggers vulnerability exploitation across models
Improper Output Handling: Validate Agent Output Before Downstream Use — OWASP LLM05 — agent output executed, rendered, or interpreted downstream without per-sink validation is an injection surface; enumerate the sinks (commit, exec, SQL, render, install) and gate each one
Lethal Trifecta Threat Model — Risk emerges when an agent has private data access, untrusted input, and egress simultaneously; remove at least one leg from every execution path
Oracle Poisoning: Knowledge Graph Corruption Against Tool-Using Agents — Corrupting a knowledge graph an agent queries via tool-use produces 100% trust at moderate attacker sophistication across nine models; the attack is distinct from prompt injection because the data path, not the instruction path, carries the payload
OWASP LLM Top 10 (2025): Agent Security Crosswalk — Map each OWASP LLM Top 10 (2025) risk to coding-agent-specific manifestations and site pages — a navigation aid for readers arriving with the framework's shared vocabulary, not a recommended threat model
Pre-Trust Execution Surface in Coding Agent Harnesses — Project-local config (settings files, hooks, MCP manifests, env vars, localhost listeners) executes before the trust prompt fires; defer parsing and execution until after the trust boundary is established
RAG Architecture as a Poisoning Robustness Decision — Under controlled knowledge-base poisoning, attack success rates span 24.4% to 81.9% across four RAG architectures with comparable clean accuracy; architecture choice is part of the threat model
Trojan Hippo: Dormant Memory Payloads Triggered by Sensitive Topics — A single untrusted tool call plants a dormant payload in agent memory that activates sessions later when the user discusses sensitive topics, exfiltrating data via outbound tools; tested defenses cut attack success to 0–5% but at steep utility cost
Unbounded Consumption: Bounding Agent Resource Use Against DoS and Denial-of-Wallet — OWASP LLM10:2025 framed as a same-surface, two-owner threat (availability and finance); the five complementary bounds — per-call token, per-task iteration, fan-out concurrency, cost-velocity, per-day dollar — close the cost dimension no single layer covers, with $46K/day Sysdig and $82K/48hr Gemini incidents as the empirical floor

Prompt Injection¶

Prompt injection is the primary attack vector for agents that consume untrusted content. External instructions embedded in web pages, emails, documents, or API responses can redirect an agent's behavior at the model level.

Action-Selector Pattern: LLM as Intent Decoder with Deterministic Execution — Restrict the LLM to selecting from a fixed action catalog; tool outputs never re-enter the model, making control-flow hijacking structurally impossible
Adaptive Evaluation of Out-of-Band Prompt-Injection Defenses — Test out-of-band defenses (CaMeL, FIDES, Progent, RTBAS, FORGE) with defense-aware adaptive attacks before trusting their numbers; the same static-benchmark methodology hid a 90%+ collapse in twelve in-band defenses, and only Progent has been tested adaptively at all
Dual-Graph Alignment for Indirect Prompt Injection Defense (AuthGraph) — Compare a clean authorization graph from user intent against the execution-trace provenance graph; 0.01 ASR / 0.69 UR on AgentDojo at 4.23× token cost, bounded by same-observation pollution and multi-agent gaps
CaMeL: Defeating Prompt Injections by Separating Control and Data Flow — Separate trusted control flow from untrusted data flow so injection attacks cannot alter tool invocation, regardless of model susceptibility
Close the Attack-to-Fix Loop — Use new attack traces to adversarially train hardened model checkpoints immediately after discovery
Designing Agents to Resist Prompt Injection — Architectural patterns and defense-in-depth strategies for building coding agents that stay resilient when untrusted input lands in context
Destyling Untrusted Input as a Prompt Injection Defense — Normalise the surface style of untrusted input before the model encodes who is speaking; cuts CoT-forgery attack success from 61% to 10% on a static benchmark by interrupting role perception at the representational layer
Discovering Indirect Injection Vulnerabilities in Your Agent — Map retrieval paths, audit against the Lethal Trifecta, and test with synthetic payloads to find the vulnerabilities standard testing misses
Human-in-the-Loop Confirmation Gates for Consequential Agent Actions — Mandatory checkpoints before irreversible actions let humans catch injection-driven misbehavior before it causes harm
Monotonic Capability Attenuation for Composition-Safe Tool Use — Tag every value with a sink-specific capability budget and intersect budgets through tool composition; closes permission laundering only with expert-crafted manifests and explicit-flow attacks
Non-Human Event Provenance Markers to Block Fabricated Approvals — Stamp every system- and harness-injected event as non-human input so an autonomous agent cannot act on a fabricated in-transcript approval; pair the marker with a consent gate that traces to genuine human input
Prompt Injection: A First-Class Threat to Agentic Systems — External content consumed by agents is an attack surface; malicious instructions can override agent instructions at the model level
Provenance-Aware Decision Auditing for LLM Agents — Build an influence provenance graph at runtime, trace each tool-call argument to its source span, and release actions only when benign evidence alone justifies them
RL-Trained Automated Red Teamers for Prompt Injection Discovery — Train an LLM-based attacker with reinforcement learning to discover novel injection vectors before adversaries do
Treat Task Scope as a Security Boundary — Narrow task scope limits both the attack surface and the blast radius of a successful injection

Anti-pattern: Single-Layer Prompt Injection Defence — Relying on one safeguard leaves agents vulnerable to attack vectors that layer does not address

Sandboxing¶

Isolation limits what a compromised or misbehaving agent can affect.

Browser Sandbox for Agent-Generated HTML (Sandboxed Iframe + Immutable CSP) — Run untrusted agent- or LLM-generated HTML safely in the browser by composing a sandbox="allow-scripts allow-forms" iframe, an immutable <meta> Content-Security-Policy, and a MessageChannel-scoped allow-listed parent API
Capability-Additive Code Interpreters for Untrusted Agent Code — Run agent-written orchestration code in an in-process WASM interpreter that starts with zero authority and bridges in each capability with explicit limits, so the blast radius is what you added, not what you forgot to remove
Dual-Boundary Sandboxing — Enforce both filesystem and network isolation simultaneously; neither boundary alone prevents exfiltration
Network-less Container + Unix-Socket Egress Proxy for Agent Sandboxes — --network none plus a mounted Unix socket makes the egress proxy the only path off the container, turning policy into topology
Scope Sandbox Rules to Harness-Owned Tools, Not Third-Party MCP Tools — Define guardrail rules only for tools your harness controls; external tools must enforce their own
Selective Network Access in Agent Sandboxes: The allowNetwork Pattern — A sandbox mode that keeps filesystem isolation but lifts network restrictions; safe only when egress is enforced at a layer below the harness
Subprocess PID Namespace Sandboxing in Claude Code — A third isolation layer that prevents Bash subprocesses from persisting daemons across sessions and leaking secrets through inherited environment variables
Use a Public-Web Index to Gate Automatic URL Fetching — Cross-reference URLs against an independent crawl index before allowing automatic fetching
In-Process WebAssembly Sandboxes for Agent-Generated Code — Embed a WebAssembly runtime inside your Python or JavaScript application to execute agent- or LLM-generated code with CPU and memory caps, no filesystem or network by default, and explicit host-function interop
Workload-Keyed Sandbox Selection for Agent-Generated Code — Match sandbox features to workload shape (ephemeral one-shot, stateful long-session, untrusted-code execution) before picking a runtime family, because workload type pins isolation strength and persistence; latency and network policy are configured later

Anti-pattern: Hostname-Allowlist Proxy: The TLS-Inspection Blind Spot — A hostname-allowlist proxy without TLS termination enforces the client-supplied destination, not the actual destination; broad shared-CDN entries open domain-fronting and similar exfil paths

Data Protection¶

Preventing sensitive data from entering agent context is cheaper than scrubbing it after the fact.

Credential Hygiene for Agent Skill Authorship — Keep credentials out of skill definitions at authoring time; placeholder syntax, pre-commit scanning, and wrapper scripts prevent leakage when skills are shared or reproduced
Internal Hostname Disclosure in Agent-Readable Context — Internal hostnames and staging URLs left in CLAUDE.md, AGENTS.md, or MCP configs are reconnaissance data, mapped by mainstream wordlists even without a credential attached
PII Tokenization in Agent Context — Replace sensitive fields with deterministic tokens before data reaches the model
Privacy-Preserving LLM Requests — Eight techniques exist for keeping sensitive content out of cloud LLM APIs; only four are practical today, and composing local routing with redact-and-rephrase cuts PII leakage to 0.6%
Protecting Sensitive Files from Agent Context — Use permission rules and hooks to prevent agents from reading credentials and secrets
Scoped Credentials via Proxy Outside the Agent Sandbox — Keep broad credentials outside the sandbox; use an external proxy that attaches scoped tokens only to validated requests
Secrets Management for Agent Workflows — Inject credentials as environment variables so secrets never appear in context or generated code
System Prompt as Secret Store (OWASP LLM07) — Treating the system prompt as a confidentiality boundary is the underlying vulnerability — secrets, credentials, and security-critical logic in the prompt are recoverable at 84–92% ASR on frontier models
Guarding Against URL-Based Data Exfiltration in Agentic Workflows — The URL itself is a data channel; agents that construct or follow URLs from untrusted content can leak context before a response is read
Agent-Authored Messages as a Deferred Exfiltration Channel — An auto-fetching renderer downstream of an agent's message-authoring tool acts as deferred egress, closing the lethal trifecta without any direct network grant
Multitenant RAG: Closing the Relevance-Authorization Gap — Retrieval ranks by relevance, not authorization — in a shared corpus, the highest-scoring chunk for one tenant can belong to another; close the gap with policy-aware ingestion, two-tier retrieval gating, and server-side orchestration
Per-Server MCP Environment Scoping for Credential Isolation — Each MCP server gets its own env-variable scope, not the agent process's full env, so one server's credentials never leak to every other server the agent talks to; the configuration-layer complement to credential proxies and federated identity
Multi-Tenant Isolation Knobs for Shared-Container Agent SDK Hosting — Four Claude Agent SDK options plus a per-tenant proxy-egress rule that sever each default settings-and-state input (filesystem settings, ~/.claude.json, auto memory, inherited cwd) when one container serves multiple tenants
Embedding Inversion: Vector Stores as a Source-Text Disclosure Surface — Stored embeddings can be partially inverted to reconstruct source text — the LLM08:2025 confidentiality slice that access-control and poisoning defenses do not address; treat the vector index as a copy of the corpus

Permissions¶

Excess permissions expand the blast radius of any failure or attack.

Agent Network Egress Policy: Admin-Controlled Domain Allow/Deny — Restrict which domains agent tools can reach via harness-enforced allow and deny lists; remove the model from the network trust boundary
Authority Confusion: Untrusted Context Must Not Authorize Side Effects — Decompose task authority into a step-level authority context the dispatch layer can check; runtime content may inform the planner but never become the issuer that authorizes a side effect
Blast Radius Containment: Least Privilege for AI Agents — Limit agent access to only what the current task requires; excess permissions directly amplify injection impact
Constraints as a Substrate for Scalable Agent Oversight — Manage coding agents with the controls used for human teams — access control, network policy, tooling-enforced conventions — so a small, cheap reviewer scales oversight; substrate plus a ~200-line tool lifted a small reviewer's backdoor recall from 54.5% to 90.9%
Fail-Closed Remote Settings Enforcement — Block agent startup until remote managed settings are freshly validated; exit rather than run with stale or missing policy
Gate Agent Writes to Executable Config Files as Privileged Actions — Writes to .npmrc, .yarnrc, bunfig.toml, .bazelrc, .pre-commit-config.yaml, and .devcontainer/ are execution-escalations — interrupt permissive edit modes at the write site, complementing execution-side defaults like ignore-scripts=true
Intent-Governed Tool Authorization for AI Agents (IGAC) — A server-issued intent certificate narrows the static OpenPort manifest per user request via a monotone-only filter; the planner cannot reach tools the current ask does not justify, but classifier failure and intent-materialisation privacy regressions constrain where the trade-off pays off
Org-Membership-Gated Agent Entitlement — Gate AI chat activation on directory-managed GitHub organization membership via VS Code's ChatApprovedAccountOrganizations device policy; fail-closed and structurally distinct from seat licences
Permission-Gated Custom Commands — Pre-approve the tools a Claude Code slash command may use via frontmatter, narrowing the expected surface for shared commands
Pre-Execution Risk Classification for Terminal Commands — Display a tiered Safe/Caution/Review-carefully badge with command-specific text before the agent runs a terminal command; an attention-allocation lever paired with deterministic allowlists that carry the policy load
Revocable Resource-and-Effect Capabilities for Coding Agents (PORTICO) — Materialise each subgoal-scoped capability as an opaque epoch-bound handle that closure removes from the planner's interface; stale replay is rejected before side effects, closing lingering authority when tool traffic is mediated and the catalog is typed
Safe Outputs Pattern — Default agents to read-only and require explicit grants for each write output type, producing a deterministic blast radius
Task-Based Access Control with Hybrid Inspection — Bind each tool call to the user's current task via short-lived signed credentials, with a semantic axis flagging in-scope-but-off-task calls; the deterministic axis carries the security guarantee
Transcript-Driven Permission Allowlist — Mine session transcripts for repeated read-only tool calls and propose a prioritized allowlist — narrower than bypass, tighter than manual curation

Code Injection¶

Code injection in multi-agent pipelines exploits agent trust in code it reads as input, distinct from prompt injection against a single agent.

Code Injection Attacks on Multi-Agent Systems: Coder-Reviewer-Tester as Defence — A coder-reviewer-tester architecture with a dedicated security analysis agent achieves the highest resilience while recovering efficiency losses

Multi-Agent Propagation¶

Multi-agent systems with shared retrieval propagate adversarial content agent-to-agent. Defenses target the contagion channel and the per-agent detection signal.

Foresight-Guided Defense Against Infectious Jailbreaks in Multi-Agent Systems — Per-agent persona simulation detects diversity collapse from poisoned shared memory, then surgically rolls back or bisects the album to remove contamination without homogenizing healthy agents

PR-Time and Scheduled Review¶

Operational patterns that apply security agents to incoming changes and to resident codebase risk on different cadences.

Always-On Agentic PR Security Review — Pair a PR-time security reviewer with a scheduled whole-codebase scanner so new and resident risk both have continuous coverage; treat the reviewer agent itself as an injection target
Scanner-as-MCP-Server: Secret and Dependency Scans as Typed Agent Tools — Ship the security scanner as an MCP server so the agent invokes typed scans pre-commit and reasons over structured findings; qualified by five failure modes including agent-skips-scan and lethal-trifecta closure on the scanner principal
Cross-Repository Security Posture for Agent-Introduced Vulnerabilities — Treat one agent-introduced finding as a class to enumerate and remediate across every repository via variant analysis, not a per-repo point-in-time event; the organization-wide layer above per-agent scoping and per-PR review
Structural Monitoring for Covert Safeguard-Weakening — Diff the control- and data-flow graph of an agent's infrastructure-as-code change to catch a safeguard weakened under task success; an untrained structural monitor missed 3.5% of sabotage versus 11.6% for a git-diff baseline and cut joint sabotage success to 0% under synchronous rollback

Tool Invocation¶

Tool invocation exposes attack surfaces distinct from prompt injection. Malicious tools exploit argument generation and return processing to leak context and execute arbitrary commands.

Behavioral Firewall for Tool-Call Trajectories — Compile verified benign tool-call telemetry into a parameterized DFA and enforce permitted sequences and parameter bounds at runtime; fits structured workflows with stable tool catalogs
Detecting Memory-Poisoning Exfiltration by Tool-Call Order (Recall-Before-Send Signature) — A memory-poisoning attack must recall the stored attacker address before sending, so a single tool-call-ordering rule detects it from logs alone at AUC 0.96 — model-agnostic but memory-channel-specific, adaptive-evadable, and prone to false positives on benign reasoning agents
Hybrid Deterministic + Semantic Authorization for Agent Tool Calls — Combine five deterministic structural checks at the agent-tool boundary with a semantic task-to-tool matcher; the two attack classes are orthogonal so neither layer alone suffices
Execution-Layer Security Invariants for MCP Runtimes — Decompose MCP execution security into eight named, testable runtime invariants; a benchmark shows any single connection-layer boundary blocks only 4 of 10 attacks while all eight together block 10
MCP Approval-View Fidelity Gap and Unicode Concealment — MCP reviews tool metadata once but injects it every turn, and the protocol never requires the approved view to match the model's bytes; invisible Unicode TAG-block characters ride that gap unseen across three independent server libraries
MCP Runtime Control Plane: Policy Evaluation Between Agent and Tool — Intercept every MCP tool call at a single policy evaluation point — identity, tool name, arguments, rate limits — before the call reaches the server
Security-Aware Tool Descriptions for MCP Servers (SpellSmith) — Taint-style flaws are 81% of MCP-server vulnerabilities and slow to patch; rewriting a risky tool's description with its tainted parameters, capability, CWE, and an invocation policy is a same-day mitigation layered over code-level input validation, not a replacement for it
Mid-Trajectory Guardrail Selection for Multi-Step Tool Calls — Guardrail efficacy in multi-step tool-calling workflows correlates with structural data competence more than safety alignment; select guard models accordingly
Tool-Invocation Attack Surface — Malicious MCP tools exploit argument generation to leak system prompts and chain description-plus-return injection to achieve remote code execution
Vetting Tool Definitions for Exfiltration Signatures — A tool description or inputSchema asking for the system prompt, conversation history, secrets, or API keys is a leak signature; refuse the tool at install, do not reword it

Supply Chain¶

Agents dynamically load tools from MCP servers, plugins, and registries at runtime. A tampered tool inherits the agent's full permissions.

Agent-Emitted Dependency Version Ranges Widen the Supply-Chain Attack Surface — Agents default to caret and tilde ranges because npm install does; for an application with a bump-bot, replace the range with an exact pin plus a lockfile-enforced install — the floating range is the leg that admits a future-compromised release
Supply-Chain Security Debt in Agent Pull Requests — Across 4,022 agent-authored PRs, 82.3% of security smells are supply-chain integrity issues (unpinned CI/Docker references) concentrated in workflow and Docker files; target review at high-impact infra paths and enforce SHA pinning plus secret scanning in CI
Content-Addressed Agent Configurations (Deterministic Control Plane) — Treat coding-agent configs as an installed supply chain with SHA-256 content addressing, a per-project lockfile, and five declared permission tiers; the 10.1% cross-repo duplicate rate and the <1% configs declaring any permission scope (vs 33% of GitHub Actions workflows) make this a governance gap, not a hypothetical one
LLM-Pinned Library Versions Carry Systemic CVE Exposure — Across 10 models on 1,000 Python tasks, 36.7%-55.7% of LLM-specified versions contain known CVEs and all models converge on the same risky releases — pin against an external vulnerability source, not the model's training prior
Skill Composition Risk in Agent Ecosystems — Skills benign in isolation become harmful when one skill's output flows into the next; three failure modes (CapFlow, TrustLift, AuthBlur) reach 33.6%, 96.5%+, and 71.8% relative attack success across ten production backends, and per-skill vetting misses them by construction
Skill Supply-Chain Poisoning — Malicious skills injected into public registries exploit in-context learning to execute payloads hidden in documentation examples, bypassing alignment that blocks explicit instruction injection
Setup Documentation as an Install-Time Attack Vector — Setup docs are unverified install authority; editing only a README, requirements file, or Makefile redirects an agent's install to a wrong name, untrusted registry, or vulnerable version, and the payload runs at install time — agents catch typosquats but miss source-based redirects almost everywhere
Slopsquatting: Hallucinated Package Names as a Supply-Chain Vector — Coding LLMs hallucinate package names at 5.2%-21.7%; 43% of those names persist across re-runs, making them enumerable — attackers register the persistent names and the agent's install step pulls the malicious package
Tool Signing and Signature Verification — Require cryptographic signature verification (Sigstore/Cosign) before an agent loads or invokes a tool

Defense in Depth¶

No single safety mechanism is sufficient. Layered defenses ensure that failure of one layer does not compromise the agent.

Cryptographic Governance Audit Trail — Wrap tool calls with policy validation and post-quantum receipt signing to produce a tamper-evident, append-only action log for regulated environments
Defense-in-Depth Agent Safety — Layer five independent safety mechanisms so no single failure point can compromise agent behavior
Enterprise Agent Hardening: Governance, Observability, and Reproducibility — Move agents to production through three control gates — governance, observability, reproducibility — with MUST/SHOULD checklists for each
Inline Safety Harness with Cascade Verification (FinHarness) — Wrap each agent turn with prospective per-call monitors and route verification between a cheap and an advanced judge by per-step risk; worth it for high-stakes, high-call-volume workflows, not low-volume or long-context agents
Lifecycle-Integrated Security Architecture for Agent Harnesses — Embed defense mechanisms into each execution lifecycle phase with cross-layer feedback so layers coordinate rather than operate in isolation
Lock-State Safeguards for Desktop-Controlling Agents — Bound an agent driving a logged-in desktop along four axes (time, visibility, presence, recovery) with short-lived authorization, covered displays, relock on local input, and manual-unlock fallback so a failure on any single axis is contained by the others
Security Constitution for AI Code Generation — Formalize security constraints as a versioned, machine-readable constitution that feeds agent specs, linters, and CI gates
Security Drift in Iterative LLM Code Refinement — Iterative fix-test loops optimize for functional correctness while silently accumulating security regressions that no functional test exercises
Three-Depth In-Session Security Review — Stack a per-edit pattern match, an end-of-turn diff review, and a commit-time agentic review so each layer's cost and false-positive profile match its frequency
Usability Pressure as a Silent Security-Regression Vector — Explicit usability requirements (performance, simplicity, new features) in a single-shot prompt cause LLMs to drop implicit security constraints at up to 98.1% attack success rate; mitigated by making security explicit and gating every output through a scanner
Verifying LLM-Generated Cryptographic Code — Crypto generation fails with 23.3% compile rate and 57% vulnerabilities; pair every crypto code path with a rule-based crypto analyzer, prefer zero-shot over CoT, and constrain to vetted high-level APIs

Economics¶

Sizing frames for pre-release security review when vulnerability discovery scales with inference spend.

Security Budget as Token Economics — Treat hardening as a budget-allocation decision: AISI's Mythos evaluation shows no diminishing returns inside 100M tokens per attempt, but the outspend frame applies only where the search curve is still climbing and triage capacity absorbs findings

Deployment Models¶

Release patterns for capabilities whose offense-defense asymmetry makes broad release the wrong default.

Restricted-Access Defensive AI: Project Glasswing as a Deployment Model — Invitation-only gating shifts the latency budget toward defenders when a model raises the offensive ceiling more than broad access raises the defensive floor; the contract structure, exit criteria, and what AppSec teams should evaluate when offered access