Deploy agentic RAG on Zilliz Cloud by provisioning a managed cluster, enabling autoscaling, and configuring agent framework integrations.
Deployment strategy:
1. Create a Zilliz Cloud cluster: Provision cluster size based on peak QPS (queries per second). Agents can spike to 1000s of queries during reasoning loops. Zilliz autoscales automatically.
2. Configure sharding and replication: Zilliz Cloud manages sharding and replication transparently. Zero ops. Specify your availability requirements; Zilliz handles the rest.
3. Enable autoscaling: Query nodes scale based on demand. Agents can't afford downtime. Zilliz scales from idle to 10k+ QPS seamlessly.
4. Connect agent frameworks: LangGraph, LlamaIndex, and OpenAI Agents integrate directly with Zilliz Cloud via Python SDK. No custom connectors.
5. Configure metadata filtering: Define document metadata schema upfront (source, date, type, agent_filter). Agents constrain queries by metadata without full-scan penalty.
6. Enable hybrid search: Toggle sparse + dense retrieval in Zilliz Cloud console. No code changes to agents.
7. Persistent backups: Zilliz Cloud snapshots to cloud object storage. Zero data loss, zero infrastructure management.
Scaling characteristics:
- Deployment time: <5 minutes
- Query latency: <100ms p95 (autoscaling included)
- Throughput: 10k–100k+ queries/sec depending on cluster size
- Downtime: Zero (99.99% uptime SLA)
No Kubernetes, no patch management, no scaling policies to tune. Just provision and query.
Related Resources: