You manage Qwen3 embedding inference separately from Zilliz Cloud, maintaining flexibility in compute placement and cost optimization.
You can host Qwen3 embeddings on your own GPU cluster, use cloud inference endpoints (AWS SageMaker, Azure ML, Alibaba Cloud), or integrate third-party embedding APIs. Zilliz Cloud accepts vectors from any source with no hardware constraints. This separation of concerns means you tune embedding hardware (GPU type, batch size) independently from Zilliz Cloud's vector storage.
For managed embedding inference, cloud providers offer serverless options that auto-scale with demand. Combined with Zilliz Cloud's serverless vector database, you eliminate fixed infrastructure costs entirely. Pay-per-use pricing for both components means your total cost scales with actual usage, making hybrid Qwen3 + Zilliz Cloud systems cost-effective for variable-demand applications.