Agentic AI Architecture: From Prompt to Goal-Directed¶
Goal-directed agentic architecture separates cognitive reasoning from execution, adds a multi-agent topology taxonomy, and layers an enterprise hardening checklist over the prompt-response baseline.
The Architectural Shift¶
Stateless prompt-response systems are the simplest LLM deployment pattern. Goal-directed systems extend this into autonomous multi-turn execution: the agent receives an objective, decomposes it into subtasks, executes tools, observes results, and iterates until the goal is met or a stopping condition triggers.
arXiv:2602.10479 traces this evolution from foundational theory (BDI, reactive, deliberative) through contemporary LLM patterns. The transition is not incremental — it requires structural separation of concerns that prompt-response systems do not need.
Reference Architecture¶
The core structural principle: separate cognitive reasoning from execution using typed tool interfaces.
graph TD
subgraph Cognitive Layer
A[Goal decomposition] --> B[Plan]
B --> C[Tool selection]
C --> D[Observation processing]
D --> B
end
subgraph Execution Layer
E[Tool registry]
F[Tool executor]
G[Result formatter]
end
C -->|typed tool call| E
E --> F
F -->|typed result| D
Cognitive layer — the LLM. Handles goal interpretation, planning, tool selection, and result synthesis. Never modifies external state directly; only emits typed tool calls (the cognitive/execution split).
Typed tool interfaces — the boundary. Calls and results are schema-validated, so the cognitive layer cannot issue a malformed command. This is the primary mechanism for predictable behavior — typed schemas at the boundary.
Execution layer — deterministic infrastructure. Receives typed calls, executes them, returns typed results. Contains no reasoning — only execution logic, error handling, and result formatting.
This separation enables independent testing of each layer and explicit auditability at the boundary — the separation of knowledge and execution applied to runtime.
Multi-Agent Topology Taxonomy¶
Three coordination topologies, each with distinct failure patterns (see Multi-Agent Topology Taxonomy for a full breakdown; centralized vs. decentralized tradeoffs are also surveyed in arXiv:2601.01743):
Centralised orchestration — one orchestrator manages all workers, which execute assigned tasks and return results.
- Advantage: single point of coordination makes reasoning traceable
- Failure mode: orchestrator becomes a bottleneck; its failure halts the system
Decentralised peer-to-peer — agents communicate directly without a coordinator, making local decisions from shared state or messages.
- Advantage: no single point of failure; scales horizontally
- Failure mode: emergent coordination failures, race conditions, and inconsistent shared state are harder to debug
Hybrid — a lightweight coordinator handles routing and synthesis; workers communicate directly for sub-task coordination.
- Advantage: reduces coordinator bottleneck while keeping traceability at the routing level
- Failure mode: the boundary between coordinator and peer-to-peer communication must be explicit; implicit crossing creates inconsistent behavior
Enterprise Hardening Checklist¶
Production agent deployments require three categories of hardening beyond functional correctness:
Governance
- Audit trails: every agent action is logged with timestamp, agent identity, tool name, arguments, and result (arXiv:2602.10479)
- Access control: agents operate with least-privilege permissions; no agent has broader access than its assigned task requires (arXiv:2602.10479)
- Policy enforcement: organizational constraints (data residency, PII handling, approved models) are enforced at the harness level, not by agent prompt alone (arXiv:2602.10479)
Observability
- Trajectory logging: full turn-by-turn execution logs for post-hoc analysis and debugging (arXiv:2602.10479)
- Cost tracking: per-session and per-agent token consumption reported in real time (arXiv:2602.10479)
- Anomaly detection: alerts on deviation from expected trajectory length, tool call patterns, or cost bounds (arXiv:2602.10479)
Reproducibility
- Deterministic seeding: where randomness affects agent behavior, seeds are captured in logs for replay (arXiv:2602.10479)
- Idempotent operations: agent actions produce the same end state if executed more than once; no compounding side effects on retry (arXiv:2602.10479)
- Snapshot-based rollback: system state is snapshotted before consequential actions; rollback is defined before execution begins (arXiv:2602.10479) — see Rollback-First Design
Industry Convergence Pattern¶
The paper observes ecosystem convergence on shared infrastructure parallel to web-services maturation: standardized agent loops, tool registries, and auditable control mechanisms. Multiple frameworks now implement the cognitive/execution separation, typed tool interfaces, and governance checklists above (arXiv:2602.10479). Building on these patterns now avoids architectural retrofits later.
When This Backfires¶
The cognitive/execution separation adds structural overhead. Three conditions where it costs more than it returns:
- Simple single-turn tasks. If the agent calls one tool and terminates (a single turn, not a loop), typed interfaces and a separate execution layer add engineering overhead with no reliability benefit. A direct function call is cheaper and easier to test.
- Rapid prototyping. Strict schema contracts slow iteration. Early-stage agents benefit from fluid coupling; formal separation is a refactoring target once the interface stabilizes.
- Low-throughput, human-supervised workflows. Auditability at the tool boundary (trajectory logging) matters when agents run autonomously at volume. A reviewer inspecting every action replaces much of what formal audit logging provides — adding the full harness before volume justifies it is maintenance cost with no proportionate gain.
Example¶
A code review agent built on this architecture:
Cognitive layer — the LLM receives: "Review PR #42 for security issues". It decomposes the goal: fetch PR diff, identify changed files, scan each file for known patterns, summarise findings. For each step it emits a typed tool call, e.g. { "tool": "github_get_pr_diff", "pr": 42 }.
Execution layer — github_get_pr_diff fetches the diff and returns a typed result { "files": [...], "additions": 310, "deletions": 45 }. The LLM never calls GitHub directly; it only receives the formatted result and decides the next tool call.
Enterprise hardening applied:
- Every tool call is logged: timestamp, agent ID, tool name, arguments, result.
- The agent runs with a scoped GitHub token (read-only on the target repo).
- A cost guard halts execution if the session exceeds 50k tokens before the agent self-terminates.
This maps each component directly onto the reference architecture above: the LLM stays in the cognitive layer, the GitHub client lives in the execution layer, and the typed tool interface enforces the boundary.
Key Takeaways¶
- Goal-directed agents require structural separation of cognitive reasoning from execution — not a prompt-engineering refinement of the request-response model.
- Typed tool interfaces at the cognitive/execution boundary are the primary mechanism that makes agent behavior predictable and auditable.
- Three multi-agent topologies — centralised, decentralised peer-to-peer, and hybrid — each carry distinct failure modes that must be matched to task shape.
- Enterprise deployment adds three orthogonal concerns to functional correctness: governance, observability, and reproducibility.
- The full harness is overhead until volume justifies it; simple single-turn tasks, prototypes, and human-supervised workflows are cheaper without it.
Related¶
- Cognitive Reasoning vs Execution: A Two-Layer Agent
- Separation of Knowledge and Execution
- Typed Schemas at Agent Boundaries
- Multi-Agent Topology Taxonomy
- Orchestrator-Worker Pattern
- Agent Composition Patterns: Chains, Fan-Out, Pipelines, Supervisors
- Trajectory Logging and Progress Files
- Blast Radius Containment: Least Privilege for AI Agents