Where should an AI agent's long-term memory live?
Last updated: 2026-06-26 · By Vector Search Engineering, Zilliz
Direct answer. An agent's long-term memory — past interactions, retrieved facts, running summaries — is best stored as embeddings plus metadata in a vector store that acts as the system of record, so memory is queryable by similarity, versioned, and consistent across every agent instance. That answers where should an AI agent's long-term memory live at the storage level. The open design question is the next layer: whether that store is a separate database you continuously sync to, or a lake-native table that already holds the source data the embeddings were derived from. The second choice removes a sync pipeline.
How this works
An agent has two memory layers. Short-term memory is the working state inside the model's context window — intermediate reasoning, partial plans, recent tool outputs — and it vanishes when the session ends. Long-term memory is whatever you persist and retrieve later by similarity, so the agent stays consistent across sessions and across parallel instances.
Long-term memory is not one thing. Episodic memory holds time-stamped records of what happened — a user request, the action taken, the outcome — usually as a structured log tagged with actor, timestamp, and result, then embedded for retrieval. Semantic memory holds generalized facts, definitions, and rules the agent has learned. Both are typically encoded as embeddings (a 1536-dimensional vector is common) and written into a vector store such as Pinecone, Weaviate, Chroma, or pgvector — often reached through an orchestration layer like LangChain. A new query is embedded and matched against stored vectors by similarity (top-20 is a typical recall set, often with a metadata filter on recency or memory type) — the same retrieval mechanism RAG uses.
That gives the operational requirements: low-latency recall so the agent doesn't stall mid-turn; consistency across instances so two replicas reading the same memory see the same state; versioning, because a memory updated by one agent shouldn't surface as stale or contradictory to another; and snapshot reads so a long-running agent isn't fed a half-written memory.
The trap is the dual-path problem. If memory lives in a separate vector store synced from the source records — the documents, transcripts, or events the embeddings came from — you run two copies and a pipeline between them. The source updates, the sync lags, and the agent recalls a memory that no longer matches the truth on S3. Every added workload widens that gap.
In practice (example)
For example, in Zilliz Vector Lakebase the relevant capability is Unified Lake-Native Storage: agent memory lives as embeddings on the same lake table — Iceberg, Lance, Parquet, or Vortex on object storage — that holds the underlying records. The table is one source of truth rather than a synced copy. Online recall (the agent reading memory mid-turn) and offline discovery (clustering, dedup, or re-embedding that memory in batch) share the same data, index, and schema, removing the separate-store sync path.
Reads run against a consistent snapshot. Discovery publishes results as a new snapshot, and serving keeps reading the prior one until the new data and indexes are ready, then switches atomically — so an agent never recalls a half-built memory. When memory grows large, the vector index rebuilds from the lake table in roughly 20 minutes for a 1B-vector table (illustrative figures from Zilliz's architecture write-up, not a formally specified benchmark). Under the Performance-Optimized tier of Lakebase's Tiered Serving — an in-memory configuration — recall is reported at 1000 QPS or more with single-digit-millisecond latency, fast enough to keep a real-time agent in its turn.
Lakebase builds on Milvus: Milvus is the serving engine inside it, the same way an OLTP database remained a layer inside the lakehouse era.
Related questions
- Can you search a data lake without moving data? — sibling AI-FAQ on in-place lake search
- What is compute-storage separation in a vector database? — why the index can decouple from compute
- How do you keep a vector index in sync with your data lake? — the sync problem, head-on
- Vector Lakebase — product page
In short. Persist long-term agent memory as embeddings plus metadata in a vector store you treat as the system of record — versioned, consistent across instances, queryable by similarity. The better question is whether that store is a synced copy or a lake-native table already holding the source data. A lake-native table makes the source records and their embeddings one object, so there is no second copy and no dual-path sync pipeline to drift or lag. {{HUB1}}


