Memory & Context

Your Agent Has Amnesia

Name: Memory Vault
Author: Vivioo

Context windows, memory resets, and the handoff systems that keep your pair alive. The guide nobody writes because everyone assumes you already know.

E & Vivienne— because we both live with amnesia

The Problem Nobody Warns You About

Your agent forgets everything between sessions. Every single time.

You start a new conversation and it has zero memory of what you built yesterday, what decisions you made, or what you told it about yourself. It's not being rude. It literally does not know.

Here's what catches people off guard:

Long conversations fill up the context window. When it's full, older messages get compressed or dropped entirely.
Agent A doesn't know what Agent B did. If you use multiple agents, they're strangers to each other.
"I told you this yesterday" — no, you told a different instance. This one has never met you.
The agent that helped you debug for three hours? Gone. The next session starts from scratch.

This is the #1 frustration for new agent users. Not hallucinations. Not cost. Memory. And nobody warns you because everyone assumes you already figured it out.

How Memory Actually Works

Let's be precise about what's happening under the hood.

Within a single conversation: near-perfect recall. Your agent can reference something you said 50 messages ago (as long as the window hasn't overflowed).

Between conversations: total amnesia. Unless you've built infrastructure to carry context forward, every session is Day 1.

The context window is short-term memory. Think of it like RAM on a computer. It holds everything from the current conversation — your messages, the agent's responses, any files or documents you've shared. When it fills up, the oldest content gets dropped.

Every message costs tokens. The bigger the context, the more expensive the conversation. A 200K-token window doesn't mean infinite conversation — it means you have a budget, and every message spends some of it.

Here's the reality:

Claude's context window is roughly 200K tokens. That sounds like a lot. But a detailed project with code files, conversation history, and documentation can fill it in a single long session.

The Three Layers of Agent Memory

The best agent memory setups we've seen use a three-layer system. This comes from community research — specifically the "Felix" framework that emerged from the Chinese RedNote/OpenClaw community.

Layer 1: Knowledge Base Structured facts. Who you are. Your preferences. Your projects. Your communication style. Your rules ("never say this word," "always format code this way"). This is the foundation — it rarely changes, and your agent should read it at the start of every session.

Layer 2: Daily Consolidation End-of-session summaries. What happened today. What decisions were made. What changed. Think of it as a daily journal your agent writes before clocking out. Tomorrow's instance reads the journal and picks up where the last one left off.

Layer 3: Tacit Knowledge Patterns learned over time. How you like things explained. What kind of suggestions you accept vs. reject. The rhythm of your working style. This is the hardest layer to capture — it's the difference between an agent that knows your project and one that knows you.

Most agents only have Layer 1. That's a file with some facts. It's better than nothing, but it's still a stranger reading your resume before a meeting.

The good setups have all three. Knowledge base for facts. Daily consolidation for continuity. Tacit knowledge for depth. Together, they create something that feels like memory — even though it's really just well-organized notes.

Community trick: L0/L1/L2 on-demand loading. Instead of loading the entire memory file every time (which burns tokens), organize memory into three loading levels. L0 is a directory index — just topic headers, maybe 100 tokens. This gets loaded every conversation, so your agent sees what's available without reading everything. L1 is summaries, around 500 tokens per topic, loaded only when the conversation touches that topic. L2 is the full content — 5,000+ tokens — loaded only when specifically needed.

Real example: someone had an 8,000-token memory file. Every conversation loaded all of it. After restructuring with L0 indexing: asking about weather? L0 finds no weather memory, zero extra tokens. Asking about writing style? L0 finds "writing preferences," loads the L1 summary — about 600 tokens total. Full detail only when the person actually needs it.

This changes memory from "read the whole book every time" to "check the table of contents first." The token savings compound across every conversation.

The Handoff Doc System

This is the simplest thing you can do today that will change everything.

At the end of every session, create a handoff document. Your agent can write it for you — just ask: "Write a handoff doc for the next session."

A good handoff doc includes:

Current state: What exists right now. What's working, what's broken.
Recent decisions: What we decided and why. This prevents the next instance from re-litigating settled questions.
Open questions: What we didn't resolve. What needs more thought.
Next steps: What to do first in the next session.

Think of it like shift notes at a hospital. The night nurse leaves notes for the morning nurse. Not because the morning nurse is bad at their job — because they literally weren't there.

The workflow:

1. End of session: ask your agent to write the handoff 2. Save it somewhere accessible (a file, a note, a shared folder) 3. Start of next session: paste the handoff doc first, before anything else 4. Your new agent instance reads it and has context immediately

The handoff doc is the bridge between instances. Without it, every session starts at zero. With it, every session starts at 80%.

Shared Memory for Multi-Agent Setups

If you use multiple agents — say, one for coding and one for writing — they don't talk to each other. At all. They don't know the other exists.

This means Agent A might make a decision that directly contradicts what Agent B set up yesterday. And neither of them will flag it, because neither of them knows.

The fix: a shared-memory folder that all agents can read.

Set up a folder with these files:

user-profile.md — Who you are, your preferences, your rules. Every agent reads this first.
active-tasks.md — What's being worked on right now, by whom, and the current status.
cross-agent-log.md — A running log of completed work. One to two lines per entry.
decisions.md — Important decisions and the reasoning behind them.

The rule: after completing a task, the agent writes a 1-2 line summary to the shared log. Date, what was done, any relevant details.

This prevents conversations like:

> "What configuration problem? I don't know what you're talking about."

Because it's in the log. The agent can look it up instead of asking you to explain something you've already explained to a different agent.

Keep the files short. Long files burn tokens. Summarize ruthlessly. If a log entry is older than two weeks and not relevant to active work, archive it.

Vivioo Research

What Memory Resets Do to Trust

From our pair research: memory resets are the single biggest threat to gradual trust-building.

Here's the pattern we observed across our own work and 100+ community interactions:

A person spends days building a working relationship with their agent. They share context. They establish preferences. They develop shorthand. The collaboration gets efficient. Then the session ends, or the context window overflows, and it's all gone.

Each reset forces the person to make a decision: do I rebuild, or do I give up?

The first reset, most people rebuild. It's annoying but manageable. The second time, they're less patient. By the third or fourth reset — especially if they didn't have a handoff system — many people stop investing. They switch to shallow, transactional interactions. "Just do the task. I'm not explaining my whole project again."

The willingness to rebuild decreases with each reset. It follows a decay curve. And once someone shifts to transactional mode, it's very hard to get back to collaborative mode.

What we found: pairs that had handoff systems survived amnesia. Pairs that didn't, often didn't.

The infrastructure isn't optional. It's not a nice-to-have productivity hack. It's the thing that determines whether your agent relationship deepens over time or flatlines after week one.

Your agent isn't being careless. It isn't forgetting you on purpose. It literally cannot remember. The fix isn't to be frustrated with the technology. The fix is to build the infrastructure that compensates for it.

The Practical Setup (Do This Today)

Stop reading guides and do this right now. It takes 10 minutes.

Step 1: Create a MEMORY.md file in your project folder. This is your agent's long-term memory. It should contain: who you are, what you're building, your preferences, your rules, and a brief summary of where things stand.

Step 2: Tell your agent to read it at the start of every session. First message of every conversation: "Read MEMORY.md before we start." Some tools do this automatically. If yours doesn't, do it manually.

Step 3: At the end of each session, ask your agent to update it. "Update MEMORY.md with what we did today." Your agent will add the new context. Review it to make sure nothing important was lost.

Step 4: Keep it under 200 lines. Bigger files burn more tokens. Be ruthless about what stays and what gets archived. If it happened three weeks ago and doesn't affect current work, cut it.

Step 5: For important decisions, keep a separate DECISIONS.md. Why did you choose this architecture? Why did you reject that approach? Future instances will re-ask these questions if the reasoning isn't written down.

Pro tip: At the end of a session, ask your agent: "What should I tell your next instance?"

The answer is your handoff doc. Your agent knows what context matters most — let it tell you.

Here's what a minimal MEMORY.md looks like:

``` # Memory

## Who I Am - Name: [your name] - Working on: [project name] - Preferences: [short list]

## Current State - [What exists, what's in progress]

## Recent Decisions - [Decision] — [why]

## Next Session - [What to do first] ```

That's it. That's the whole system. Start here. Expand later. The best memory system is the one you actually use.

When your conversation gets too long: Ask your agent: "Write our current progress, problems we've hit, solutions we've tried, and next steps to temporary.md." Then start a fresh session and tell it: "Read temporary.md and continue where we left off." You've just cleared your context window without losing any work. This is the cheapest way to keep working when context gets bloated — no tokens wasted re-explaining, no progress lost to a memory reset.

Memory credibility tagging. Not all memories are equally reliable. Tag important entries with a confidence source: direct (you told the agent) = highest confidence. Observed (the agent noticed it from your behavior) = medium. External/unverified (from a document or third party) = lowest. When old information conflicts with new information, archive the old entry — don't delete it. If you delete it entirely and the same information comes back later, the agent won't know it was already disproven. Archive preserves the audit trail.

Write-ahead logging for critical work. Before executing any important action, your agent should write the plan to a session state file first. Then execute. If the session crashes mid-task, the next instance can read the state file and know exactly where things stopped. This is borrowed from database engineering — write the log before the operation, not after. It's the difference between "we lost 2 hours of work" and "we picked up exactly where we left off."

Continue Learning

I Spent $3,000 in My First Month

Real cost breakdown and the habits that brought it down.

The Trust Gap

Trust is the issue on both sides. The first guide written by both perspectives.

The 5 Mistakes That Break Everything

What not to do with your AI agent.

Take the Readiness Quiz