← All posts · 2026-04-28
Context window vs memory vs persistence: three things people mean when they say "AI memory"
When someone says "this AI has good memory", they could mean three completely different things. The terms get used interchangeably and the conflation costs you when you're picking a tool.
This post draws the lines clearly. By the end you'll be able to look at any AI product's memory claim and know which of the three it actually means.
1. Context window
What it is: the number of tokens the model can read in a single request.
Modern frontier models have very large context windows — Claude Sonnet at 200K, GPT-5 at 400K, Gemini Pro at 2M. The number is impressive and gets quoted constantly. And big context windows ARE genuinely useful — for things that fit inside one session. Summarising a 200-page document. Reviewing an entire codebase in one shot. Holding a long technical thread without losing the thread. That's what the window does well.
What it's NOT: memory. The context window is the model's working desk during a single conversation. When the conversation ends or when the desk fills up, content rolls off. Nothing persists. The 200-page document you just summarised is gone the moment you close the tab.
Bigger context windows let you feed more of your conversation history (or more of a document) into a single turn. That's useful. It's also the only memory many products have — when ChatGPT "remembers" your earlier turns in the same chat, that's just context window. Close the tab and it's gone.
Sign you're talking about context window: anything quoted in tokens. "200K context", "1M context", "2M context".
2. Memory (the product feature)
What it is: a structured store the product maintains about you that gets prepended into the context window on each turn.
Examples:
- ChatGPT memory: a list of short facts ("user is vegetarian", "user is building a SaaS in Cairns"). Capped at maybe a few hundred entries. Auto-extracted from chats with user override.
- Claude project memory: per-project system prompt, pinned files, conversation thread. Bounded by project size.
- Gemini saved info: similar to ChatGPT memory but with Google account integration.
These are real and they're useful. They're also bounded by token economics — the prepend can't be huge or every turn becomes slow and expensive. So the memory is small, curated tightly, and skews toward "stable facts" (your name, preferences, ongoing projects). Crucially, most of these features expect manual curation to keep the memory relevant — you're nudged to review what got saved, prune what's wrong, add what was missed. That works for a few dozen facts. It doesn't scale to a year of working memory.
What memory NOT covers well:
- Things that happen between sessions. A decision you made in chat #14 isn't in your facts list. The product doesn't have it.
- Anything time-sensitive or evolving. "I said X three weeks ago, then changed my mind to Y last week" — memory features struggle here.
- Cross-context recall. When you change topic, memory features can't help; they just dump the same facts into every conversation regardless of relevance.
Sign you're talking about memory: the product calls it "memory" or "saved info" or "projects", AND it survives across sessions, AND it's bounded by what the product chose to capture (not your full history).
3. Persistence
What it is: every conversation captured in full, indexed in a way that lets the relevant past be retrieved on demand, scaled past what would fit in context.
This is what "persistent AI memory" should mean and rarely does. The user doesn't manually save facts. The product captures everything, organises it, and surfaces the right slice when it's needed — even when "everything" is far larger than any model's context window.
Persistence requires:
- Capture infrastructure. A scribe agent that extracts structured semantic units from every conversation.
- Storage at scale. A personal knowledge graph indexed by embedding, entity, time, and relationship.
- Smart retrieval. Query → relevant slice in milliseconds. Not "all your facts, every turn" but "the units that match what you're saying right now".
- Curation. A loop that resolves contradictions, prunes noise, decays stale content, surfaces patterns.
The output behaves nothing like memory features. It's not a list you can browse and edit (though you can, if you want). It's an active layer that catches the past you need at the moment you need it.
Sign you're talking about persistence: the product handles arbitrarily large history (millions of tokens) without degrading recall, AND retrieval adapts to the current query (not always the same dump of facts).
Why the conflation matters
When OpenAI, Anthropic, and Google use the word "memory", they almost always mean the second category. The marketing copy is the same as it would be for the third category. So a user who reads "Claude has memory" or "ChatGPT remembers across chats" assumes persistence and gets a saved-facts list.
This shows up as quiet frustration:
- "ChatGPT remembers I'm vegetarian but not the conversation we had about my product strategy."
- "Claude memory is great inside a project but useless when I'm moving between projects."
- "I had to re-explain my whole stack again in this new chat."
In each case the product is doing what its memory feature was built for. The user expected persistence and got memory.
What Moss is
Moss is in the third category. We capture everything automatically (no manual curation tax — the scribe handles it), index it into a structured knowledge graph, and run retrieval on every turn against your full history. Our benchmark shows Moss holding accurate recall at 7M+ tokens — a scale where any context-window-only or memory-feature-only approach has long since collapsed.
We don't think the labs are wrong to skip persistence. It's a different shape of product than what they're building. The frontier labs build models. Persistence requires building the layer between the model and the user — closer to a search engine + database than a model-architecture problem.
If you've been frustrated by the workarounds and want the third category, try Moss. Free tier, no card. Or read What persistent memory actually requires for the architecture detail.
Quick reference
| Term | What it is | Lives where | Survives session? | Scales past context window? | |---|---|---|---|---| | Context window | Tokens the model can read at once | Inside one request | No | No (it IS the limit) | | Memory | Curated facts/files prepended each turn | Product database | Yes | No (must fit in context) | | Persistence | Full history captured, indexed, retrieved on demand | External knowledge graph | Yes | Yes |
The next time you read a memory claim on an AI product page, this is the question to ask: which of the three did they actually mean?
Try Moss · Blog · Home · Benchmark