SiliconFlow and Zilliz Cloud Integration
SiliconFlow and Zilliz Cloud integrate to power efficient GenAI applications, combining SiliconFlow's scalable Model as a Service platform with access to leading open-source models alongside Zilliz Cloud's high-performance vector database for building production-ready RAG systems and semantic search solutions.
Use this integration for FreeWhat is SiliconFlow
SiliconFlow is committed to building a scalable, standardized, and high-performance AI Infra platform. SiliconCloud is one of the flagship offerings from SiliconFlow, described as a Model as a Service (MaaS) platform. It provides a comprehensive environment for deploying various AI models, including large language models (LLMs) and embedding models. SiliconCloud aggregates numerous open-source models such as Qwen2.5, DeepSeek-V2.5, and BGE, enabling users to easily access and utilize these resources with OpenAI-compatible APIs and built-in model acceleration.
By integrating with Zilliz Cloud (fully managed Milvus), SiliconFlow users gain access to a fully managed vector database that efficiently stores and retrieves the embeddings generated by SiliconFlow's models, making it straightforward to build production-ready RAG systems, semantic search, and other AI applications at scale.
Benefits of the SiliconFlow + Zilliz Cloud Integration
- Seamless model-to-storage pipeline: SiliconFlow's OpenAI-compatible APIs make it easy to generate embeddings and LLM responses, while Zilliz Cloud stores and indexes these embeddings for fast retrieval — creating a smooth end-to-end workflow.
- Access to diverse open-source models: SiliconCloud aggregates leading open-source models including LLMs and embedding models, giving developers flexibility to choose the best model for their use case without managing infrastructure.
- Production-grade RAG systems: The combination delivers production-grade performance, scalability, and reliability for building RAG applications, from document Q&A to knowledge-based chatbots.
- Cost-effective AI development: SiliconFlow offers complimentary access to certain models, and paired with Zilliz Cloud's efficient vector storage, teams can build powerful AI applications while keeping costs manageable.
How the Integration Works
SiliconFlow provides a Model as a Service platform that handles model inference, including text embedding generation and LLM responses. Through its OpenAI-compatible APIs, developers can easily call embedding models like BAAI/bge-large-en-v1.5 and language models like DeepSeek-V2.5 without managing model infrastructure.
Zilliz Cloud serves as the vector database layer, storing and indexing the embeddings generated by SiliconFlow's models. It provides high-performance similarity search with low latency, enabling applications to retrieve the most relevant context from large knowledge bases efficiently.
Together, SiliconFlow and Zilliz Cloud create a complete RAG solution: documents are embedded using SiliconFlow's models and stored in Zilliz Cloud. When a query comes in, it is embedded using the same model, and Zilliz Cloud finds the most relevant documents through vector similarity search, which are then passed to SiliconFlow's LLM to generate contextually informed responses.
Step-by-Step Guide
1. Install Dependencies
$ pip install --upgrade pymilvus openai requests tqdm2. Prepare the API Key
SiliconFlow enables the OpenAI-style API. Log in to the SiliconFlow website and prepare the API key as an environment variable.
import os os.environ["SILICON_FLOW_API_KEY"] = "***********"3. Prepare the Data
We use the FAQ pages from the Milvus Documentation 2.4.x as the private knowledge in our RAG. Download the zip file and extract the documents.
$ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip $ unzip -q milvus_docs_2.4.x_en.zip -d milvus_docsLoad all markdown files from the folder and split the content by "# " to separate each main part.
from glob import glob text_lines = [] for file_path in glob("milvus_docs/en/faq/*.md", recursive=True): with open(file_path, "r") as file: file_text = file.read() text_lines += file_text.split("# ")4. Prepare the Embedding Model and Define Embedding Function
Initialize a client using SiliconFlow's OpenAI-compatible API, and define a function to generate text embeddings using the
BAAI/bge-large-en-v1.5model.from openai import OpenAI siliconflow_client = OpenAI( api_key=os.environ["SILICON_FLOW_API_KEY"], base_url="https://api.siliconflow.cn/v1" ) def emb_text(text): return ( siliconflow_client.embeddings.create(input=text, model="BAAI/bge-large-en-v1.5") .data[0] .embedding )Generate a test embedding and print its dimension and first few elements.
test_embedding = emb_text("This is a test") embedding_dim = len(test_embedding) print(embedding_dim) print(test_embedding[:10])5. Create a Milvus Collection and Load Data
Create a Milvus client and collection.
from pymilvus import MilvusClient milvus_client = MilvusClient(uri="./milvus_demo.db") collection_name = "my_rag_collection" if milvus_client.has_collection(collection_name): milvus_client.drop_collection(collection_name) milvus_client.create_collection( collection_name=collection_name, dimension=embedding_dim, metric_type="IP", consistency_level="Strong", )As for the argument of
MilvusClient: Setting theurias a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file. If you have large scale of data, you can set up a more performant Milvus server on Docker or Kubernetes. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust theuriandtoken, which correspond to the Public Endpoint and API Key in Zilliz Cloud.Iterate through the text lines, create embeddings, and insert the data into Milvus.
from tqdm import tqdm data = [] for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")): data.append({"id": i, "vector": emb_text(line), "text": line}) milvus_client.insert(collection_name=collection_name, data=data)6. Build RAG — Retrieve Data and Generate Response
Define a question and search for it in the collection.
question = "How is data stored in milvus?" search_res = milvus_client.search( collection_name=collection_name, data=[emb_text(question)], limit=3, search_params={"metric_type": "IP", "params": {}}, output_fields=["text"], )Convert the retrieved documents into a string format, define the prompts, and use
deepseek-ai/DeepSeek-V2.5provided by SiliconCloud to generate a response.context = "\n".join( [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances] ) SYSTEM_PROMPT = """ Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided. """ USER_PROMPT = f""" Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags. <context> {context} </context> <question> {question} </question> """ response = siliconflow_client.chat.completions.create( model="deepseek-ai/DeepSeek-V2.5", messages=[ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": USER_PROMPT}, ], ) print(response.choices[0].message.content)Learn More
- Build RAG with Milvus and SiliconFlow — Official Milvus tutorial for building a RAG pipeline with SiliconFlow
- SiliconFlow Embedding Functions in Milvus — Milvus documentation on configuring SiliconFlow embedding functions
- Build Your Custom RAG Pipelines with Hands-on Tutorials — Zilliz collection of RAG tutorials and hands-on guides
- SiliconFlow Documentation — Official SiliconFlow quickstart documentation
- SiliconFlow Homepage — Official SiliconFlow website