Query rewriting in agentic RAG means the LLM agent reformulates the user's original question into a better-structured search query before sending it to Zilliz Cloud, improving retrieval precision for underspecified or conversational inputs.
The implementation typically adds one prompt step before the Zilliz Cloud search call. The agent receives the user query, generates 2-3 rewritten variants — each emphasizing a different aspect of the information need — and issues parallel searches against your Zilliz Cloud collection. Because Zilliz Cloud is a managed service, it handles concurrent query load efficiently without you provisioning additional infrastructure for the extra search volume.
A more advanced pattern uses hypothesis generation: the agent first produces a short "ideal answer" stub, then uses that stub as the query vector instead of the original question. This technique, known as HyDE (Hypothetical Document Embeddings), tends to retrieve more relevant results when the original question is expressed in conversational language that differs from how your documents are written.
Zilliz Cloud's hybrid search capability combines dense and sparse retrieval in a single managed API call, which is especially powerful after query rewriting — the rewritten query can be used for both semantic and keyword matching simultaneously, surfacing documents that the original phrasing would have missed.
Related Resources
- Zilliz Cloud Managed Vector Database — managed vector search
- Agentic RAG with Claude and Milvus — agentic pattern guide
- Intelligent RAG with LangGraph — multi-step RAG
- Retrieval-Augmented Generation — concepts