05 // Mechanism 5
TAM — Token Allocation Memory.
Here's the hard problem nobody talks about: how do you keep a working conversation in an AI's context window when the context window is tiny and the conversation is long?
Everyone else uses one of three broken approaches: truncate the conversation (lose information). Use vector embeddings to retrieve relevant parts (expensive, lossy). Just hope the context window is big enough (wasteful, doesn't scale).
We invented something called TAM: Token Allocation Memory.
Think of it like computer RAM. RAM is the layer between persistent disk storage and the CPU. TAM is the layer between persistent conversation storage and the AI's context window.
The full conversation is stored on your computer, forever, in a compressed format. Every turn, we assemble a working context from three independent streams — the compressed conversation history, the operational context, and the live screen state. Each stream has a different volatility. The conversation is stable across days. The screen state changes every millisecond. The operational context updates hourly. We budget tokens for each, compose them into a working window, and hand it to the AI.
The working window is ephemeral — it dissolves after the turn. Next turn, it's reassembled fresh from the three sources. This isn't a cache. It's not retrieved. It's composed.
Today, on a base model with a 30,000-token context window, the same window holds turn 5 and turn 200 at the same cost. A conversation can grow forever; the per-turn payload stays flat. We call it the forever-moving window.
For the first time, you can have an AI assistant that remembers everything you've ever told her, but never wastes tokens on irrelevant history, and never forgets anything important.