Track retrieval latency, relevance scoring, agent loop iterations, and embedding quality to ensure agentic RAG performance.
Critical metrics:
1. Retrieval latency (p50, p95, p99): Should be <100ms for single query, <50ms for batch. Slow retrievals block agents and frustrate users.
2. Relevance recall@k: Of top-k retrieved documents, how many are relevant? Aim for >80% recall@5 in production.
3. Agent loop count: How many times does the agent re-query before answering? Median of 2–3 loops is healthy; >5 loops indicates poor embedding quality or irrelevant data.
4. Failed retrievals: Percentage of queries returning 0 results. Track by agent type. >5% indicates embedding drift or missing data.
5. Embedding freshness: How often are embeddings updated? Embeddings >30 days old degrade relevance by ~15%.
6. False positive rate: Documents retrieved but marked irrelevant by agent. >20% indicates query expansion is too broad; reduce k or add filters.
7. Agent success rate: Percentage of agent workflows completing without fallback. Target >95% without escalation.
8. Context window utilization: Average tokens consumed per agent query. Agentic workflows can hit LLM context limits if retrievals aren't selective.
Zilliz Cloud exports these metrics natively via Prometheus and CloudWatch integrations. Set up dashboards in your existing observability stack immediately.
Related Resources: