Solver-Externalized Constraint Reasoning (MaxSAT/SMT Encoding)¶
Have the agent emit a formal encoding for z3, python-sat, or OR-Tools instead of reasoning through constraints in prose — then verify the solver's output.
Solver-externalized constraint reasoning is a three-step pattern: the agent translates a multi-constraint problem into a formal encoding (MaxSAT, SMT, or CP), an exact off-the-shelf solver returns an optimum, and an independent check confirms the result satisfies the original intent. It generalises the Program-of-Thought line — "let the model write code to compute, not reason in prose" — to constraint satisfaction and preference optimisation specifically (Wolfe — Program-Aided Language Models).
When This Pattern Applies¶
The pattern is conditional, not universal. Apply it only when all three hold:
- Crisp objective — preferences can be encoded as numeric weights or hard/soft clauses without inventing arbitrary trade-offs the user did not state
- Decidable, tractable fragment — the constraint theory is in a fragment the chosen solver handles (linear arithmetic, bitvectors, finite-domain CSP). Nonlinear integer arithmetic and similar undecidable fragments are out of scope (John D. Cook on Z3 limits)
- Verifiable solution — you can independently check that the solver's answer satisfies the original constraints, not just the encoded ones (Orvalho et al. — arXiv:2605.29687)
Skip it when the problem is small enough that prose reasoning suffices, when preferences are fuzzy enough that any numeric encoding imposes false precision, or when the constraints depend on real-world signals the encoding cannot ingest.
The Three-Step Pattern¶
graph LR
A[Natural-language<br>problem] --> B[Agent emits<br>solver code]
B --> C[Off-the-shelf<br>solver runs]
C --> D[Verifier checks<br>original intent]
D -->|Pass| E[Return solution]
D -->|Fail| B
1. Encode. The agent reads the problem statement and writes Python that constructs the formal encoding — z3.Solver() calls for SMT, pysat clauses for MaxSAT, ortools.sat.python.cp_model for CP-SAT. The encoding step is where natural-language preferences become hard constraints (must-haves), soft constraints with weights (preferences), and an objective.
2. Solve. An exact solver computes a satisfying or optimal assignment. SAT, SMT, and MaxSAT solvers come with formal correctness guarantees on the encoded problem — that is the only layer of the stack with that property.
3. Verify. A separate check confirms the solver's output satisfies the original constraints stated in the prompt — not just the ones the encoding captured. The cited paper does this with a dual-encoding canonicalisation that accommodates multiple optima (Orvalho et al. — arXiv:2605.29687). Practical setups can verify by re-running the constraints against the candidate solution and printing the proof.
The verification step is not optional. LLM-generated combinatorial-optimisation output is not feasibility-safe by default; constraint violations are routine without an explicit repair or check layer (Yan et al. — arXiv:2602.01090).
Why It Works¶
The pattern moves the load-bearing step out of the model and onto a tool with formal correctness guarantees. LLMs are reliable at natural-language understanding and code synthesis; they are unreliable on multi-step constraint satisfaction, where each added constraint compounds silent-mistake risk. Generating the formal encoding and handing it to an exact solver converts an unreliable competency into a reliable one — the same mechanism that makes Program-of-Thought work for arithmetic: separate code generation from code execution so the verifier certifies the result the model could not (Orvalho et al. — arXiv:2605.29687; Wolfe — Program-Aided Language Models). The cited paper reports the externalised pipeline exceeding 80% acceptance on three reasoning families where direct, chain-of-thought, and program-of-thought baselines rarely produce a feasible solution at all. Independent replication on operations-research problems shows the same shape: a three-stage decomposition (model → solver code → debug) outperforms prompt-engineering LLM-only baselines by 7% accuracy (Zhang et al. — OR-LLM-Agent, arXiv:2503.10009).
When This Backfires¶
- Soft preferences without a crisp objective. When preferences cannot be encoded as numeric weights without arbitrary choices, the solver returns precision the input does not justify — the encoding step becomes the source of error (Cook on Z3 limits).
- Undecidable or intractable fragments. Nonlinear integer arithmetic is undecidable; SMT solvers return
unknownor time out. The pattern fails silently when constraints drift into undecidable fragments (Cook on Z3 limits). - Natural-language mis-translation. The model produces syntactically valid solver code that encodes the wrong problem. Without verification, the agent confidently reports an "optimal" answer to a different question (Yan et al. — arXiv:2602.01090; Wang et al. — arXiv:2409.09271).
- Tiny problems and low stakes. The solver dependency is not justified when prose reasoning suffices, and a human spots a mistake in prose more easily than in MaxSAT clauses.
- Constraints that depend on signals the encoding cannot ingest. When real-world state (current load, qualitative judgement) belongs in the constraint set but cannot be made formal, the solver returns a mathematically optimal but operationally wrong answer.
Off-the-Shelf Solvers¶
Reach for an established solver rather than building a paper-specific one. The pattern is about externalising to a verified routine, not about a particular SAT or SMT engine.
| Solver | Theory | Use for |
|---|---|---|
| z3 | SMT (LIA, BV, arrays, strings) | Configuration validation, dependency resolution, type-system encodings |
| python-sat | SAT + MaxSAT | Preference-weighted constraint problems with discrete choices |
| OR-Tools CP-SAT | CP + scheduling | Scheduling, timetabling, resource allocation, sequencing |
Practitioner reports already use this triad with LLM-generated encodings — translating natural-language scheduling requirements into CP-SAT code for personal logistics is a documented case (Lobsters discussion; Towards Data Science: Constraint Programming in Python).
Example¶
A team-scheduling problem: assign five engineers to five on-call shifts so each engineer covers at most two shifts, each shift has one engineer, no engineer covers consecutive shifts, and preferences (engineer X prefers weekends, engineer Y prefers weekdays) are honoured where possible.
Instead of asking the agent to reason through the assignment in prose, prompt it to emit CP-SAT code:
from ortools.sat.python import cp_model
model = cp_model.CpModel()
engineers, shifts = range(5), range(5)
x = {(e, s): model.NewBoolVar(f"x_{e}_{s}")
for e in engineers for s in shifts}
# Each shift covered by exactly one engineer
for s in shifts:
model.Add(sum(x[e, s] for e in engineers) == 1)
# Each engineer covers at most two shifts
for e in engineers:
model.Add(sum(x[e, s] for s in shifts) <= 2)
# No engineer covers consecutive shifts
for e in engineers:
for s in range(len(shifts) - 1):
model.Add(x[e, s] + x[e, s + 1] <= 1)
# Soft preferences as objective
preferences = {(0, 4): 1, (0, 3): 1, (1, 0): 1, (1, 1): 1}
model.Maximize(sum(w * x[e, s] for (e, s), w in preferences.items()))
solver = cp_model.CpSolver()
status = solver.Solve(model)
The solver returns an assignment; a verification pass checks each hard constraint against the candidate before the agent reports the result. If verification fails, the encoding step gets repaired — not the solver's output (Yan et al. — arXiv:2602.01090).
Key Takeaways¶
- Apply only when constraints have a crisp objective, the theory is decidable, and you can independently verify the solver's output against the original intent.
- The encoding step is where mistakes hide — the solver is correct on the problem it was given, not on the problem the user asked.
- Pick an off-the-shelf solver (z3, python-sat, OR-Tools CP-SAT) instead of building a paper-specific system; the pattern is about externalising, not about a particular engine.
- A verification pass that checks the solver's answer against the original constraints is part of the pattern, not optional.
Related¶
- Externalization in LLM Agents: Memory, Skills, Protocols, and Harness — Broader framework for moving cognitive burdens out of the model; this pattern is the constraint-reasoning instance of that shift.
- Cognitive Reasoning vs Execution: A Two-Layer Agent Architecture — The architectural split this pattern depends on — reasoning generates the encoding, execution runs the solver.
- RubricRefine: Pre-Execution Rubric Refinement for Code-Mode Tool Use — Pre-execution check on agent-emitted tool-use code; the verification step in this pattern plays the same role for solver code.
- Critic Agent Pattern — A second model reviewing the encoding before solver invocation is the practical form of this pattern's verification step.
- Deterministic Guardrails Around Probabilistic Agents — The exact solver and the verification pass are deterministic guardrails wrapped around the probabilistic encoding step.