This is a technical piece for engineering teams evaluating stateful agent tooling. If you're a business owner looking for practical AI solutions, see what we do →
Every developer who has used an AI coding assistant knows the ritual. You open a fresh session. You explain your codebase, your conventions, the strange thing about your build pipeline, the reason that one module is structured the way it is. The model is genuinely brilliant for an hour. Then you close the window, and all of it is gone. Tomorrow you start over, re-briefing a stranger who happens to be a genius. It is like working alongside a savant with anterograde amnesia: dazzling in the moment, incapable of building on yesterday.
That ritual is not a quirk of any one product. It is the default condition of large language models. They are, in the technical sense, stateless: each request is processed in isolation, and nothing is carried forward except whatever you manually stuff back into the prompt [1]. This is as true of the best tools on the market, Claude Code and OpenAI's Codex, as it is of a raw API call. They are extraordinary within a session and forgetful between them. The model that helped you debug a race condition last Tuesday remembers nothing about it on Wednesday.
There is a category of tool built on the opposite premise, and it is the one we have come to trust most. Letta Code is a stateful coding agent, and the word doing all the work in that sentence is stateful. It does not begin every session at zero. It carries forward what it has learned, across sessions, across machines, and even across model generations. After living with it in production, our view is straightforward: for teams that want a coding agent they can grow with over months and years, it is the tool we chose to build our practice on, and the one we recommend.
This is a thorough, sourced look at why. We will cover what Letta is and the research it stands on, what makes Letta Code distinct, how it stacks up head to head against Claude Code and Codex, and how we deploy and operate it for the businesses we work with. If you are evaluating agent tooling, this is the analysis we wish someone had handed us.
First, what Letta actually is
To understand Letta Code, you have to understand Letta, and to understand Letta, you have to go back to a research project.
In 2023, researchers at UC Berkeley published a paper called MemGPT: Towards LLMs as Operating Systems [2]. The core idea was elegant and slightly subversive. A language model has a fixed context window, a finite amount of text it can "see" at once. That window is, functionally, the model's working memory, and it is small. The MemGPT insight was to treat that limitation the way an operating system treats limited physical RAM: build a memory hierarchy, and let the system page information in and out of the context window the way an OS pages data between RAM and disk [2][3].
In that analogy, the context window is RAM, fast, precious, and tiny. External storage, a database, a set of files, an archive of past conversations, is the disk: vast, slower to reach, but durable. The agent itself is given tools to move information between the two: to decide what deserves to stay resident in working memory and what can be written out to long-term storage and recalled later. The model, in other words, manages its own memory through function calls, just as a program asks the OS for more memory [2].
MemGPT grew into Letta, now a company and an open-source framework, and the operating-system framing remains the spine of the whole system [4]. The headline difference from a normal LLM call is this: a Letta agent is a persistent entity with state that lives in a database, not a transient prompt that evaporates when the request returns. You do not "send a message to a model." You message an agent that exists continuously, that remembers your last conversation, and that has been quietly editing its own memory in between [5].
Memory blocks: the abstraction that makes it work
The piece of Letta worth understanding in detail, because it is the piece that does the most work, is the memory block [6].
A memory block is a structured, labeled chunk of text that lives inside the agent's context window and is always visible to the model. It is not retrieved on demand like a search result; it sits permanently in the system prompt, so the agent never has to go looking for it [6][7]. Each block has a label (say, `human` or `persona` or `codebase_conventions`), a description that tells the agent what the block is for, a value (the actual content), and a size limit so it cannot grow without bound [6].
The crucial property is that blocks are read-write by the agent itself. Letta gives the agent memory tools, with names like `core_memory_append` and `core_memory_replace`, so it can update its own persistent state during a conversation [6][7]. When you tell the agent "we always use snake_case for database columns," it does not just nod and forget. It can write that fact into a memory block, and from then on the rule is part of who the agent is, surfaced in every future session automatically.
Two more details matter for what comes later. First, blocks can be marked read-only, so some knowledge is fixed and some is mutable. Second, blocks can be shared across multiple agents, giving several agents a single synchronized piece of memory [6][8]. That single capability is what lets a fleet of agents cooperate as a coherent system rather than a pile of disconnected bots, and it matters enormously once you move past a single coding agent toward an organization of them.
Around the memory blocks sits a fuller hierarchy: recall memory (searchable conversation history) and archival memory (a long-term vector and text store the agent can query) [2][3]. Letta even added a filesystem, MemFS, that lets agents organize and reference documents, PDFs, transcripts, and docs the way a person organizes files in folders [9]. In Letta's own benchmarking, a simple filesystem approach scored 74 percent on the LoCoMo long-conversation memory benchmark, beating several specialized memory libraries, which is a tidy demonstration that the OS analogy is not just a metaphor but a genuinely good engineering strategy [9].
The reframing that took us a while to internalize: with Letta, memory is not a feature bolted onto a chatbot. Memory is the agent. The model is just the engine that reads and rewrites it.
So what is Letta Code?
Letta Code, launched in December 2025, is what you get when you take that stateful-agent foundation and aim it squarely at software engineering [10]. On the surface it looks like the coding agents you already know, a terminal harness, and now a desktop app, that can read your repo, run commands, edit files, and work through tasks. Underneath, it is built around long-lived agents that persist across sessions and improve with use, rather than the independent, throwaway sessions every other coding assistant gives you [10][11].
The practical workflow looks like this. When you start, you run an `/init` command, and the agent does deep research on your codebase, reading through it, forming memories, and rewriting its own system prompt, via memory blocks, as it learns what your project is and how it is built [10]. From then on it keeps learning automatically, but you can also explicitly tell it to reflect and consolidate what it has learned with a `/remember` command. And because every session is tied to a persistent agent, you can use `/search` to query the full history of your past conversations, vector, full-text, or hybrid search over everything you have ever worked on together [10][11].
But the feature that genuinely changed how we work is skill learning [12].
Skills: agents that write their own playbook
A huge fraction of real engineering work is repetition with variation. Generating a database migration when a schema changes. Setting up a new dashboard. Following the team's specific, slightly idiosyncratic process for an API change. You coach the agent through one of these gnarly tasks once, and normally that coaching is wasted the moment the session ends.
Letta Code lets the agent learn a skill from that experience [12]. After you have walked it through a complex task, you can have it distill what it learned into a reusable skill that it, or another agent, can reference next time a similar task comes up. Letta's own team reports skills their agents have contributed with human help, generating DB migrations on schema changes, creating PostHog dashboards via the CLI, codifying best practices for API changes [10]. Their research shows skill learning measurably improving performance on future similar tasks rather than the usual degradation you see as context fills up [12][13].
The detail we love, as engineers, is that skills are just markdown files [10]. That means they live in git like any other code: versioned, reviewed in pull requests, diffed, rolled back. It also means a skill one agent learns can be picked up by another agent, even a different coding tool that understands the same skill format. The agent's accumulated competence becomes a portable, inspectable asset instead of a black box.
Letta took this even further with Context Repositories, a rebuild of Letta Code's memory on top of git-based versioning, so that an agent's entire memory is a versioned repository of markdown files [14]. And with sleep-time compute, sometimes called "dreaming," background subagents review and reorganize the agent's memory during downtime, running in isolated git worktrees so they can tidy up long-term memory without blocking the agent that is actively working [15][16]. The agent literally gets better while it is idle.
The part that wins on the benchmarks: it is model-agnostic, and it still leads
Here is the fact that moves Letta Code from "interesting" to "best in class."
Letta Code is the number one model-agnostic open-source harness on Terminal-Bench, the standard benchmark for terminal-using coding agents [10][17]. "Model-agnostic" means the harness is not welded to any single model provider. You can run it on Claude, on a GPT model, on Gemini, or on an open-weight model you host yourself, and the harness stays the same. "Number one" means it does this without paying a performance tax: Letta Code's results are comparable to the provider-specific harnesses, Claude Code, Gemini CLI, and Codex CLI, running on those providers' own models, and it significantly outperforms the previous leading model-agnostic harness [10]. Even with memory set aside entirely, a Letta Code agent works about as well with a frontier model as the model maker's own bespoke tool does.
That single property has a profound consequence. We wrote recently about the great LLM repricing, how the cost of "good enough to run a business on" is collapsing, and how the smart move is no longer picking one model but routing each task to the cheapest model that can do it well. A model-agnostic harness is what makes that strategy possible. You can move a workload from an expensive frontier model to a cheaper open-weight one without rebuilding the agent, because the agent, its memory, its skills, its identity, lives in Letta, not in the model. The engine is swappable. The agent is not.
Letta's own research articulates this well: they describe continual learning "in token space," and make the point that an agent which can carry its memories across model generations will outlast any single foundation model [18]. Models change, seemingly monthly. The work an agent has accumulated should not be thrown away every time a new one ships. Letta lets the memory persist while the models churn underneath it.
Letta Code vs Claude Code vs Codex: the honest comparison
Claude Code and OpenAI's Codex are excellent tools, and we say that without hedging. Claude Code is a polished, proprietary harness tuned to get the most out of Anthropic's models, with strong reasoning, large context, and a mature ecosystem of subagents and MCP integrations. Codex is OpenAI's fast, efficient, open-source CLI, well known for token-frugal execution and tight DevOps workflows. If you are committed to a single provider and your work fits inside a session, either is a fine choice.
The difference is structural, and it comes down to two questions: does the tool remember, and is it locked to one vendor? On both, Letta Code is built differently.
| Capability | Claude Code | Codex CLI | Letta Code |
|---|---|---|---|
| Memory across sessions | Per-session context (configurable via CLAUDE.md) | Per-session context | Stateful agent, persistent by default |
| Learns reusable skills from experience | Limited | Limited | Built-in skill learning |
| Memory portable across model generations | No | No | Yes, memory lives in token space |
| Model provider | Anthropic only | OpenAI only | Any model, fully model-agnostic |
| Open-source harness | Proprietary | Open source | Open source |
| Memory model | Context + tools per session | Context per session | OS-style memory hierarchy with shared blocks |
| Independent benchmark standing | Provider harness | Provider harness | #1 model-agnostic on Terminal-Bench |
Read down the memory rows, because that is the real divide. Claude Code and Codex both treat each session as a fresh start; whatever the agent learned about your codebase yesterday has to be re-established today, usually through configuration files you maintain by hand. Letta Code treats memory as the default state of the agent. It learns once and keeps it [10][18]. Add the provider row, and the strategic picture is complete: with Claude Code you are betting on Anthropic, with Codex you are betting on OpenAI, and with Letta Code you are betting on no one, free to follow the best price and performance wherever it goes.
Our take, after running all three, is simple. For a quick, single-session task on a model you are already committed to, the provider tools are great. For an agent you intend to keep, one that should know your systems better next quarter than it does today, Letta Code is the one we reach for, and the one we recommend.
Where Intueo comes in
A stateful agent is only as valuable as its deployment. The gap between a powerful open-source tool and a production-grade system your business can rely on is real: security boundaries, model routing, monitoring, skill curation, and operational reliability. That engineering work is what we do. We don't reinvent the memory layer. We make it work for teams that need it to be dependable, auditable, and quietly improving every week.
Our deployment methodology
The benchmarks and the architecture are the theory. What matters to a business is what happens when you run it properly, at production grade, instead of as a personal experiment on a laptop. That gap, between a powerful tool and a powerful tool you can actually trust with real work, is where we operate.
One agent, many machines
A stateful agent is not tied to a device. We run Letta Code across both Windows and macOS, and it behaves as one agent with one memory, reachable from either. What it learns on a Mac, it knows on a Windows box, because the state lives with the agent, not the hardware [5][19]. Letta's remote-environment support extends this further: you can reach an agent working on one machine from your phone [19]. Cross-platform stops being a compatibility headache and becomes a non-issue, because there is only one agent and the operating systems are simply doors into the same room.
Grounded in the real work
A coding agent is far more useful when it is aware of the work it supports, not just the code. Letta's filesystem and memory tooling are built to ingest and organize exactly the material a business runs on, documents, transcripts, specifications, ongoing context, so the agent reasons from a grounded picture of how the software is actually used rather than from the codebase in isolation [9]. This connection is what closes the perennial gap between what a team thinks its product does and what its users actually experience.
Secure, isolated, and under your control
This is the part that turns an impressive demo into something a serious organization can adopt. Because the Letta Code harness is open source and model-agnostic, it can be deployed on infrastructure you control rather than handed to a third party [10][14]. Every deployment is isolated per-client, with data and memory kept inside that boundary, and routed to whichever model meets the customer's cost, performance, and data-residency requirements, including self-hosted open-weight models where regulation demands it. Git-backed memory and markdown skills mean every change to an agent's knowledge is inspectable, versioned, and reviewable [14], which is exactly the auditability that compliance-bound teams need and that off-the-shelf, session-based tools do not provide.
Who this is for
There is a specific kind of team we built this practice for, and you may recognize yourself in it. They have already discovered Claude Code or Codex, they love the agent-driven workflow, and they have started cobbling together something more permanent, often quite literally a Mac mini or two in the corner running a coding agent around the clock, scripts to feed it context, a folder of hand-maintained instructions that the tool forgets every session anyway.
It is a smart instinct, and it runs into the same wall every time: the tools were not designed to remember, to be secured, to run reliably unattended, or to be shared across a team. Our approach delivers that same ambition the way it should be delivered, on a stateful foundation that actually learns, hosted securely, integrated with the systems the business already uses, and operated so it stays up and keeps improving. The teams buying Mac minis to build their own agent are right about where this is going. That's how we get them there without the duct tape.
Our deployment model productionizes Letta Code for engineering teams, and extends to the broader stateful agents, the digital employees and customer-facing agents, that the same Letta foundation supports [10]. For the businesses that want a true agent of their own rather than a chatbot with their logo on it, this is the machinery underneath that promise: a stateful agent on a memory architecture designed, from a research paper up, to learn from experience [2][20].
A brief note on our own stack, since people ask: we build on the same best-in-class foundation we recommend. We are not interested in reinventing a memory layer that a strong research team has already proven and open-sourced; we are interested in deploying it exceptionally well. That is the value we add: the engineering discipline that turns a strong open-source foundation into a system a business can depend on.
Where this goes next
The direction could not be clearer, and it is the one the whole field is moving toward: agents that remember, agents that learn from the work they actually do, and a memory layer durable enough to outlive whichever model is fashionable this quarter. Letta Code is the most complete expression of that we have found, and operating it well, securely, reliably, and integrated into a real business, is the work we are most excited about.
If you want an agent like this working inside your business, on infrastructure you can trust, remembering your context and improving every week, come talk to us. And if your team is already running Claude Code or Codex and wondering how to make it permanent, this is exactly the conversation to have.
Intueo Labs is an AI automation and agent engineering practice. We help teams deploy and operate stateful AI agents in production. Talk to us →
References
- [1]Letta — Stateful Agents: The Missing Link in LLM Intelligence (2025)—https://www.letta.com/blog/stateful-agents
- [2]Packer et al. — MemGPT: Towards LLMs as Operating Systems, UC Berkeley (2023)—https://arxiv.org/abs/2310.08560
- [3]Letta — Anatomy of a Context Window: A Guide to Context Engineering (2025)—https://www.letta.com/blog/anatomy-of-a-context-window
- [4]Letta — Announcing Letta / MemGPT is now part of Letta (2024)—https://www.letta.com/blog/announcing-letta
- [5]Letta — Agent Memory: How to Build Agents that Learn and Remember (2025)—https://www.letta.com/blog/agent-memory
- [6]Letta — Memory Blocks: The Key to Agentic Context Management (2025)—https://www.letta.com/blog/memory-blocks
- [7]Letta Docs — Memory and Memory Blocks—https://docs.letta.com/memory
- [8]Letta — Conversations: Shared Agent Memory across Concurrent Experiences (2026)—https://www.letta.com/blog/conversations
- [9]Letta — Letta Filesystem & Benchmarking AI Agent Memory: Is a Filesystem All You Need? (2025)—https://www.letta.com/blog/benchmarking-ai-agent-memory
- [10]Letta — Letta Code: A Memory-First Coding Agent (December 2025)—https://www.letta.com/blog/letta-code
- [11]Letta Docs — Letta Code—https://docs.letta.com/letta-code
- [12]Letta — Skill Learning: Bringing Continual Learning to CLI Agents (December 2025)—https://www.letta.com/blog/skill-learning
- [13]Letta — Can Any Model Use Skills? Adding Skills to Context-Bench (2025)—https://www.letta.com/blog/skills-context-bench
- [14]Letta — Introducing Context Repositories: Git-based Memory for Coding Agents (February 2026)—https://www.letta.com/blog/context-repositories
- [15]Letta — Sleep-time Compute (2025)—https://www.letta.com/blog/sleep-time-compute
- [16]Lin et al. — Sleep-time Compute: Beyond Inference Scaling at Test-time, arXiv (2025)—https://arxiv.org/abs/2504.13171
- [17]Letta — Building the #1 Open-Source Terminal-Use Agent using Letta (2025)—https://www.letta.com/blog/terminal-bench
- [18]Letta — Continual Learning in Token Space (December 2025)—https://www.letta.com/blog/continual-learning-in-token-space
- [19]Letta — Remote Environments for Letta Code (2026)—https://www.letta.com/blog/remote-environments
- [20]Letta — RAG is not Agent Memory (2025)—https://www.letta.com/blog/rag-is-not-agent-memory




