For most production RAG systems, storing embeddings in Milvus is a solid choice when you need scalable similarity search, metadata filtering, and production-ready operations. Whether you use self-hosted Milvus or managed Zilliz Cloud, the key requirements are: efficient vector search, strong indexing options, and the ability to store and filter metadata (version, tenant, language, access level). These are core needs for grounding Opus 4.6 on a real knowledge base.
The decision usually comes down to operational preferences and constraints. Self-hosted Milvus gives you full control over deployment, networking, and data locality, but requires you to operate and upgrade the cluster. Zilliz Cloud reduces operational overhead and can accelerate time-to-production, especially when you need reliability and autoscaling without building a large infra team. In both cases, design your schema carefully: choose embedding dimension, set indexes appropriate for your scale, and include metadata fields that match how users query (product/version/lang).
To make embedding storage effective, don’t just dump raw docs. Chunk thoughtfully, keep chunk sizes consistent, store canonical URLs, and track updated_at so you can re-index incrementally. This lets Opus 4.6 stay grounded on current content with predictable retrieval latency. The result is a stable RAG system where the model’s long-context capacity is a bonus, not a requirement.
