Harness EngineeringUpdated 2026-06-21 · Version 1.0

What are Agent Memory Systems?

Agent memory is how an AI agent retains and recalls information beyond a single context window — across steps, sessions and tasks. It typically separates short-term working memory (the current context) from long-term memory (durable stores the agent reads from and writes to). Memory is what lets an agent carry state through a long task, remember a user over time, and avoid repeating work. It is a core layer of harness engineering.

Evidence: Industry observationConfidence: HighSource: Industry observationSource: Paper

Definition

Agent memory is the set of mechanisms an AI agent uses to store, organize and retrieve information beyond its immediate context window, spanning short-term working memory and long-term persistent memory.

Key takeaways

  • Memory extends an agent beyond a single finite context window.
  • Short-term (working) vs long-term (persistent) memory serve different roles.
  • Long-term memory is often retrieved on demand, like RAG over the agent's own history.
  • Good memory prevents repeated work and lost state on long tasks.
  • What to write, keep and forget is a design decision, not automatic.

Context

A model's context window is finite, so anything an agent must remember beyond it needs an external store. Without memory, an agent forgets earlier steps, repeats actions and cannot personalize across sessions.

Memory turns a stateless model into a system with continuity. The hard part is curation: deciding what is worth writing down, how to organize it, and how to retrieve only what is relevant now — closely tied to context engineering.

Architecture

Two layers: working memory (the current context window, holding the active task) and long-term memory (external stores — vector, key-value, document or graph — written during a task and retrieved later). Some designs add episodic (events), semantic (facts) and procedural (skills) memory.

The agent reads relevant memories into context at each step and writes new ones as it learns. Retrieval, summarization and forgetting policies keep memory useful rather than overwhelming.

Components

Working memoryLong-term store (vector/KV/graph)Write policyRetrieval policySummarization / consolidationForgetting / expiry

Benefits

  • Continuity across steps, sessions and tasks.
  • Personalization that persists over time.
  • Avoids repeated work and lost context.
  • Enables long-horizon, stateful agents.

Risks

  • Stale or wrong memories poison future answers.
  • Privacy and governance obligations on stored data.
  • Retrieval of irrelevant memories adds noise.
  • Unbounded growth without consolidation or expiry.

Tools & technologies

Vector databasesKey-value / document storesMemGPT-style memory managersAgent frameworks (LangGraph, Agents SDKs)

Examples

  • An assistant remembering a user's preferences across sessions.
  • A long-running agent summarizing progress so it never repeats a step.
  • A support agent recalling a customer's prior tickets when relevant.

FAQs

How is agent memory different from RAG?
RAG retrieves from an external knowledge base; agent memory retrieves from the agent's own accumulated state. The retrieval machinery is similar, but the source and write path differ.
What is the difference between short- and long-term memory?
Short-term (working) memory is the live context window for the current task; long-term memory is a persistent store the agent reads from and writes to across tasks and sessions.
Why not keep everything in the context window?
Windows are finite and quality drops as they fill. Externalizing state to memory and retrieving only what is relevant keeps the agent focused and affordable.
What are the governance concerns?
Stored memory may contain personal or sensitive data, so it needs access control, retention limits and the ability to correct or delete entries.

References