Agentic AI Architecture: From Prompt to Goal-Directed¶

Goal-directed agentic architecture separates cognitive reasoning from execution, adds a multi-agent topology taxonomy, and layers an enterprise hardening checklist over the prompt-response baseline.

The Architectural Shift¶

Stateless prompt-response systems are the simplest LLM deployment pattern. Goal-directed systems extend this into autonomous multi-turn execution: the agent receives an objective, decomposes it into subtasks, executes tools, observes results, and iterates until the goal is met or a stopping condition triggers.

arXiv:2602.10479 traces this evolution from foundational theory (BDI, reactive, deliberative) through contemporary LLM patterns. The transition is not incremental — it requires structural separation of concerns that prompt-response systems do not need.

Reference Architecture¶

The core structural principle: separate cognitive reasoning from execution using typed tool interfaces.

graph TD
    subgraph Cognitive Layer
        A[Goal decomposition] --> B[Plan]
        B --> C[Tool selection]
        C --> D[Observation processing]
        D --> B
    end
    subgraph Execution Layer
        E[Tool registry]
        F[Tool executor]
        G[Result formatter]
    end
    C -->|typed tool call| E
    E --> F
    F -->|typed result| D

Cognitive layer — the LLM. Handles goal interpretation, planning, tool selection, and result synthesis. Never modifies external state directly; only emits typed tool calls (the cognitive/execution split).

Typed tool interfaces — the boundary. Calls and results are schema-validated, so the cognitive layer cannot issue a malformed command. This is the primary mechanism for predictable behavior — typed schemas at the boundary.

Execution layer — deterministic infrastructure. Receives typed calls, executes them, returns typed results. Contains no reasoning — only execution logic, error handling, and result formatting.

This separation enables independent testing of each layer and explicit auditability at the boundary — the separation of knowledge and execution applied to runtime.

Multi-Agent Topology Taxonomy¶

Three coordination topologies, each with distinct failure patterns (see Multi-Agent Topology Taxonomy for a full breakdown; centralized vs. decentralized tradeoffs are also surveyed in arXiv:2601.01743):

Centralised orchestration — one orchestrator manages all workers, which execute assigned tasks and return results.

Advantage: single point of coordination makes reasoning traceable
Failure mode: orchestrator becomes a bottleneck; its failure halts the system

Decentralised peer-to-peer — agents communicate directly without a coordinator, making local decisions from shared state or messages.

Advantage: no single point of failure; scales horizontally
Failure mode: emergent coordination failures, race conditions, and inconsistent shared state are harder to debug

Hybrid — a lightweight coordinator handles routing and synthesis; workers communicate directly for sub-task coordination.

Advantage: reduces coordinator bottleneck while keeping traceability at the routing level
Failure mode: the boundary between coordinator and peer-to-peer communication must be explicit; implicit crossing creates inconsistent behavior

Enterprise Hardening Checklist¶

Production agent deployments require three categories of hardening beyond functional correctness:

Governance

Audit trails: every agent action is logged with timestamp, agent identity, tool name, arguments, and result (arXiv:2602.10479)
Access control: agents operate with least-privilege permissions; no agent has broader access than its assigned task requires (arXiv:2602.10479)
Policy enforcement: organizational constraints (data residency, PII handling, approved models) are enforced at the harness level, not by agent prompt alone (arXiv:2602.10479)

Observability

Trajectory logging: full turn-by-turn execution logs for post-hoc analysis and debugging (arXiv:2602.10479)
Cost tracking: per-session and per-agent token consumption reported in real time (arXiv:2602.10479)
Anomaly detection: alerts on deviation from expected trajectory length, tool call patterns, or cost bounds (arXiv:2602.10479)

Reproducibility

Deterministic seeding: where randomness affects agent behavior, seeds are captured in logs for replay (arXiv:2602.10479)
Idempotent operations: agent actions produce the same end state if executed more than once; no compounding side effects on retry (arXiv:2602.10479)
Snapshot-based rollback: system state is snapshotted before consequential actions; rollback is defined before execution begins (arXiv:2602.10479) — see Rollback-First Design

Industry Convergence Pattern¶

The paper observes ecosystem convergence on shared infrastructure parallel to web-services maturation: standardized agent loops, tool registries, and auditable control mechanisms. Multiple frameworks now implement the cognitive/execution separation, typed tool interfaces, and governance checklists above (arXiv:2602.10479). Building on these patterns now avoids architectural retrofits later.

When This Backfires¶

The cognitive/execution separation adds structural overhead. Three conditions where it costs more than it returns:

Simple single-turn tasks. If the agent calls one tool and terminates (a single turn, not a loop), typed interfaces and a separate execution layer add engineering overhead with no reliability benefit. A direct function call is cheaper and easier to test.
Rapid prototyping. Strict schema contracts slow iteration. Early-stage agents benefit from fluid coupling; formal separation is a refactoring target once the interface stabilizes.
Low-throughput, human-supervised workflows. Auditability at the tool boundary (trajectory logging) matters when agents run autonomously at volume. A reviewer inspecting every action replaces much of what formal audit logging provides — adding the full harness before volume justifies it is maintenance cost with no proportionate gain.

Example¶

A code review agent built on this architecture:

Cognitive layer — the LLM receives: "Review PR #42 for security issues". It decomposes the goal: fetch PR diff, identify changed files, scan each file for known patterns, summarise findings. For each step it emits a typed tool call, e.g. { "tool": "github_get_pr_diff", "pr": 42 }.

Execution layer — github_get_pr_diff fetches the diff and returns a typed result { "files": [...], "additions": 310, "deletions": 45 }. The LLM never calls GitHub directly; it only receives the formatted result and decides the next tool call.

Enterprise hardening applied:

Every tool call is logged: timestamp, agent ID, tool name, arguments, result.
The agent runs with a scoped GitHub token (read-only on the target repo).
A cost guard halts execution if the session exceeds 50k tokens before the agent self-terminates.

This maps each component directly onto the reference architecture above: the LLM stays in the cognitive layer, the GitHub client lives in the execution layer, and the typed tool interface enforces the boundary.

Key Takeaways¶

Goal-directed agents require structural separation of cognitive reasoning from execution — not a prompt-engineering refinement of the request-response model.
Typed tool interfaces at the cognitive/execution boundary are the primary mechanism that makes agent behavior predictable and auditable.
Three multi-agent topologies — centralised, decentralised peer-to-peer, and hybrid — each carry distinct failure modes that must be matched to task shape.
Enterprise deployment adds three orthogonal concerns to functional correctness: governance, observability, and reproducibility.
The full harness is overhead until volume justifies it; simple single-turn tasks, prototypes, and human-supervised workflows are cheaper without it.