Two-stage retrieval improves search quality by combining fast dense retrieval with intelligent cross-encoder reranking, eliminating false positives and prioritizing genuinely relevant results.
In stage one, Zilliz Cloud retrieves top-k candidate documents using Qwen3 embeddings—this is efficient and comprehensive, capturing semantically similar content. In stage two, Qwen3-Reranker scores each candidate by directly modeling query-document relevance, identifying which results truly match user intent. This eliminates semantic similarity false positives that can rank irrelevant-but-similar documents high.
Zilliz Cloud streamlines this workflow: store Qwen3 embeddings in our managed vector index for instant retrieval, then apply Qwen3-Reranker as a post-processing step on your application server or through external inference endpoints. The separation of retrieval and ranking lets you optimize each stage independently: adjust Milvus collection parameters for speed, or Qwen3-Reranker batch sizes for throughput. For enterprise search applications, this two-stage design typically delivers 20-40% improvements in ranking quality without requiring reindexing.