What "Agentic" Actually Costs to Wire Together
Lessons on context discipline, state validation, and closing the feedback loop from building a purely social LLM game.
This issue is brought to you by MarketBeat
Building autonomous systems is less about model intelligence and more about context engineering. This deep-dive breaks down the architecture of Unwritten Realms, a text-only game where every NPC is an LLM agent.
The core takeaways:
Context over Model: Believability is a state and grounding problem. A mid-sized model with strict context discipline outperforms a giant model with sloppy data.
The Validate-and-Repair Loop: Agents should never touch state directly. Instead, feed engine rejections back to the model as natural language observations so it can self-correct in character.
Dual-Call Planning: Separate private tactical reasoning from public dialogue performance to keep agents strategic without ruining the illusion.
I build things to understand them. The latest is a small game called Unwritten Realms, and I built it to find out what “agentic” actually costs once you stop writing about it and have to wire it together yourself.
The setup: you’re dropped into a small world with a goal, find a secret, recover an item. Every other character is an LLM agent with a personality, private facts it knows and you don’t, its own goals, and an inventory. You talk to them. That’s the entire interface. No inventory screen, no “take” button. If you need the treasure map to win, there’s exactly one path: persuade a character who has it to hand it over.
That constraint is the whole reason the project taught me anything. The second the game is purely social, there’s nowhere to hide, no UI doing the work, no scripted dialogue tree. A character decides to help you or it doesn’t, and everything underneath that decision is real AI engineering: what the model sees, what it’s allowed to do, and how much of what it produces you’re willing to trust.
The short version of what I learned: a believable agent is almost never a model problem. It’s a context problem and a grounding problem. Get those two right and a mid-sized model is plenty. Get them wrong and no amount of model will save you.
What I’ll walk through
The Architecture of Reality: How to assemble a character’s worldview every single turn—and why the model is the least interesting part of it.
The Guardrail Loop: The tool-calling layer and the validate-and-repair loop that keeps a creative model from quietly corrupting state.
Evolving State: Where the project is headed next—building characters that notice a world changing around them.
A character is just context
Each character is an object holding six things: a persona (free-text prose for who they are), a set of traits with intensities (cautious 0.7, greedy 0.4), private facts, goals, a memory of what’s happened to it, and an inventory.
The model behind all of them is the same model, I’m running gpt-5.2 against an OpenAI-compatible endpoint. No fine-tuning, no per-character weights. The difference between a paranoid hoarder and a chatty merchant is entirely in the text I assemble.
That’s the first thing that surprised me. I kept reaching for “smarter model” as the lever, and the lever that actually moved believability was context discipline, what I put in front of the model, and what I deliberately leave out.
Rebuilding reality every turn
Here’s the uncomfortable fact at the center of any agent like this: the model is stateless and it has no senses. It can’t see the room. It doesn’t remember the last thing it said. Between one message and the next it knows nothing. So every turn, I reconstruct the character’s entire reality from the engine, which is the actual source of truth, and serialize it into the prompt.
A turn’s context gets assembled in layers:
The persona and traits, as a stable system preamble, who you are, how you behave.
The character’s private facts and its current goals.
A worldview snapshot pulled fresh from the engine: which room you’re in, who else is present, which exits exist, what you’re carrying.
Relevant memories, newest first.
The recent dialogue with this player.
The craft is entirely in the selection. Dump everything and two things break at once: you blow the token budget, and you bury the handful of facts that matter under a transcript the model stops attending to. So memory gets compressed, a rolling summary of older events plus the last few turns verbatim, and facts get scoped: a character is only ever handed what it would plausibly know. The model’s job is to be fluent and in-character. My job is to make sure the only reality it can reason from is the one I chose to show it.
This is also my honest answer to “how do you keep secrets in the game.” The soft layer is the persona, I instruct a character to guard what it knows. But prompt-level secrecy is exactly that, soft: a determined player can sometimes talk a character out of a fact it was told to protect, which, in a game about social engineering, is arguably the point. The hard guarantee is a different thing entirely, and it lives in the next section.
The validate-and-repair loop
The model never touches game state. It proposes. Every consequential thing a character can do is a tool, move to a connected location, transfer an item, record something to memory, speak a line. I expose these as function schemas and let the model emit native tool calls: structured arguments, not free text I have to scrape out of prose. That alone removes a whole category of brittleness.
But “structured” is not the same as “correct.” A creative model will cheerfully propose a transfer_item for a map it doesn’t hold, invent a char_id that was never in the scene, or try to walk to a room that isn’t connected to this one. If any of those mutated the world, the game would rot within a few turns. So between the proposal and the world sits a validator, and it’s the most important code in the project:
def validate(self, world):
giver = world.get_character(self.from_char_id)
world.get_character(self.to_char_id) # unknown recipient -> reject
giver.inventory.get_item(self.item_id) # giver lacks the item -> reject
State lives in the world — locations, inventories, characters — never in the model. An action runs only if it passes validation; otherwise it’s rejected and the world stays exactly as it was.
The part I care about most, and the part that took longest to get right, is what happens after a rejection. My first version was fail-fast: an illegal call threw, I logged it, the turn moved on. Stable, but dumb, the character had no idea its action didn’t land, so it would narrate as if it had. The version the architecture is built around now closes the loop: a rejected action becomes an observation that goes back to the model, “you tried to give the map; you don’t have it”, and the model gets to revise within the same turn. It’s the same shape as retrying a failed function call, except the feedback is natural language and the model is expected to reason about it: apologize in character, try something legal instead, change tack. Grounding isn’t a one-shot filter you bolt in front of the model. It’s a short conversation between the model and the engine, and the engine always gets the last word.
That reframed how I think about agent design in general. We spend most of our guardrail effort making sure an agent can’t perform an unauthorized action, schemas, permissions, validation. Necessary. But the more useful half is the feedback: an agent told precisely why an action was refused recovers gracefully, where one that just hits a wall stalls or hallucinates around it. The rejection isn’t a dead end. It’s signal, and it belongs back in the context.
2026’s Hidden Breakout Stocks (Free Report)
Our analysts just went through every sector, chart, and earnings trend to find the 10 stocks most likely to dominate the rest of 2025 - and they compiled them all in the Top 10 Best Stocks to Own in 2026.
In this free report, you’ll discover companies primed for a Fed-fueled rally - from AI powerhouses to dividend machines. These names are already attracting billions in institutional inflows, so this is your chance to position early.
Get the Top 10 Best Stocks to Own in 2026 now - before it’s gone for good.
❤️ The best way to keep our mission going - at zero cost to you - is by checking out this free guide from our sponsor.
Planning without breaking character
For a long time I didn’t give characters a planner, I let the model improvise turn to turn, and I feared that any explicit strategy loop would make them robotic, announcing plans instead of being the person. Improvisation gets you a long way, but it has a ceiling. A character with a real goal, get the map, protect a secret, turn the player against someone, needs to behave consistently across many turns, and pure in-the-moment generation drifts: it forgets what it was angling for two exchanges ago.
So I built one, and the trick that made it work was splitting the turn into two model calls.
The first call is a private planning step: a cheaper, structured call that takes the character’s goals, current worldview, and memory and returns a short tactical intention, something like stall; find out why they want the map before giving anything up. It looks ahead only a step or two and re-plans every turn, so the strategy bends with the conversation instead of locking the character into a script reality has already broken. The second call performs that intention in character and it never sees the word “plan,” only the intention, handed over as motivation. The strategy is the model’s private reasoning; the dialogue is the performance. The player never hears the gears turn.
The validate-and-repair loop feeds straight into this. A rejected action isn’t only an in-character recovery; it’s an observation the planner consumes on the next turn, the give failed, I don’t actually have it, so the strategy corrects itself instead of stubbornly retrying. That’s the real payoff of separating the two calls: the planner gives a character intent that persists across time, the performance layer keeps it human, and the grounding loop keeps the whole thing honest about what’s actually possible in the world.
Underneath all of it, smaller dials do a surprising amount of work. The traits-with-intensities trick is the clearest: “greedy 0.8” reliably shifts how hard a character bargains before parting with an item; drop it to 0.3 and the same character gives things away. A cheap, legible knob, no fine-tuning, just a number the model interprets consistently across turns. I run slightly warmer sampling on the dialogue so characters don’t collapse into one flat voice, while the planning and tool layers stay tight, because correctness there isn’t a stylistic choice.
Where this is going
The architecture is built around an event bus. Every action emits a typed event, someone moved, an item changed hands, a line was spoken and those events are how characters become aware of a world shifting around them instead of only reacting when the player pokes them directly. Wiring each character to observe the events it should plausibly notice is what turns a room of reactive NPCs into something that feels alive, and it’s most of what I’m building now. The same channel carries rejections back into reasoning, so a character’s picture of the world and the world’s actual state keep converging instead of drifting apart.
That’s the Unwritten Realms I want to play: characters whose sense of reality is rebuilt, every turn, from a world they can act on but never quietly rewrite.
Your turn
I’m curious where other builders land. When you ship an agent, how much of your effort goes into the model versus the context you assemble around it and the loop that grounds its actions? And have you found feeding failures back to the model worth the complexity or do you just fail fast and move on?
Tell me in the comments, or find me on LinkedIn. I want to know where you draw these lines.
About the author:
I’m AmirAli. I spend my time building, testing, and breaking autonomous systems to figure out how they actually work under the hood.
If you enjoyed this deep-dive and want to follow along with more architectural experiments, let’s connect. You can find me sharing updates and behind-the-scenes engineering notes over on LinkedIn.











This diagram is the honest version. “Agentic” is mostly deciding what to stuff back into the prompt every single turn, because the model forgets everything between calls. The intelligence is cheap. The expensive part is the plumbing: what to load, what to drop, what to keep around. Most agent demos quietly skip that bill.
The “rebuilt every turn” box is where the bill actually lives. You re-send the whole character each turn, so memory and inventory quietly eat the context window until the agent forgets what it was doing. Most people budget for the clever part and get surprised by the bookkeeping. Wiring it is cheap. Keeping it coherent over 40 turns is the cost.