Agentic RAG agents must carefully manage LLM context windows—they can exceed limits when retrieving large documents across multiple loops.
Context management strategies:
1. Selective retrieval: Retrieve only k=3–5 results per query, not k=20. Agents rewrite queries to get more relevant results rather than retrieve everything.
2. Summarization in loops: After each retrieval, agent summarizes results before re-querying. Compresses context by 60–80% without losing information.
3. Document chunking: Store document chunks (256–512 tokens) in Zilliz, not full documents. Agents retrieve multiple small chunks, stay within context.
4. Metadata-driven filtering: Constrain retrieval to specific documents upfront. If agent knows to look in "Q4_2025_reports", it narrows the search space.
5. Two-stage retrieval:
- Stage 1: Dense search returns document IDs
- Stage 2: Agent fetches only the relevant sections of those documents
- Zilliz returns metadata (doc_id, chunk_id) for lightweight filtering
6. Context budget per loop: Set a token limit per retrieval round. If approaching limit, agent uses summarization or stops looping.
Example: Agent answering "What are our top 3 supplier risks?" might loop 3 times:
- Loop 1: Retrieve supplier profiles (100 tokens)
- Loop 2: Retrieve risk assessments for top suppliers (150 tokens)
- Loop 3: Retrieve mitigation strategies (120 tokens)
- Total: 370 tokens, well within 4K–8K context windows
Design agentic workflows with context budgets. Zilliz Cloud's metadata and chunking support this natively.
Related Resources: