Vanna and Zilliz Cloud Integration
Vanna and Zilliz Cloud integrate to power AI-driven SQL generation, combining Vanna's open-source Python RAG framework for natural language to SQL conversion with Zilliz Cloud's high-performance vector database for efficient retrieval of database schemas, documentation, and query history.
Use this integration for FreeWhat is Vanna
Vanna is an open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality. It works in two easy steps — train a RAG "model" on your data (DDL schemas, documentation, and SQL queries), and then ask questions which will return SQL queries that can be set up to run on your database. Vanna features a security-focused design that prevents database content from being transmitted to LLMs without explicit permission, high accuracy linked to training data quality, and continuous improvement through user interaction.
By integrating with Zilliz Cloud (fully managed Milvus), Vanna gains access to a scalable vector database that efficiently stores and retrieves embedded database schemas, documentation, and previous queries, enabling more accurate and efficient SQL generation — particularly valuable for large, complex datasets requiring precise query generation.
Benefits of the Vanna + Zilliz Cloud Integration
- Accurate SQL generation through RAG: Zilliz Cloud stores embedded DDL schemas, documentation, and SQL query examples, enabling Vanna to retrieve the most relevant context for generating precise SQL queries from natural language questions.
- Security-focused design: Vanna prevents database content from being transmitted to LLMs without explicit permission, while Zilliz Cloud provides secure vector storage for the training data embeddings.
- Continuous improvement: As new queries and feedback are generated, they are embedded and stored in Zilliz Cloud, continuously improving the system's SQL generation accuracy over time.
- Flexible deployment: Vanna supports self-hosted deployment with Milvus Lite for local development or Zilliz Cloud for production-scale deployments, giving teams flexibility in their infrastructure choices.
How the Integration Works
Vanna serves as the SQL generation framework, providing the RAG pipeline that trains on database schemas (DDL), documentation, and SQL query examples. It handles natural language question processing, context assembly from retrieved training data, and LLM-based SQL query generation using models like GPT-3.5-turbo.
Zilliz Cloud serves as the vector database layer through the
Milvus_VectorStoreclass, storing and indexing embedded training data (DDL, documentation, and SQL queries) for fast similarity search. When a user asks a question, it retrieves the most relevant training examples to provide context for SQL generation.Together, Vanna and Zilliz Cloud create an intelligent SQL generation solution: database schemas, documentation, and example queries are embedded and stored in Zilliz Cloud during training. When a user asks a natural language question, Zilliz Cloud retrieves the most relevant context, and Vanna's LLM generates an accurate SQL query based on this context — with the system continuously improving as new queries are added to the training data.
Step-by-Step Guide
1. Install Required Packages
$ pip install "vanna[milvus,openai]"2. Set Up Environment and Define the VannaMilvus Class
Set the OpenAI API key and define a new class that combines Milvus vector store and OpenAI chat capabilities:
import os os.environ["OPENAI_API_KEY"] = "sk-***********" from pymilvus import MilvusClient, model from vanna.milvus import Milvus_VectorStore from vanna.openai import OpenAI_Chat class VannaMilvus(Milvus_VectorStore, OpenAI_Chat): def __init__(self, config=None): Milvus_VectorStore.__init__(self, config=config) OpenAI_Chat.__init__(self, config=config)As for the argument of
MilvusClient: Setting theurias a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file. If you have large scale of data, you can set up a more performant Milvus server on Docker or Kubernetes. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust theuriandtoken, which correspond to the Public Endpoint and API Key in Zilliz Cloud.3. Initialize VannaMilvus and Connect to a Database
Initialize the VannaMilvus class with configuration parameters and connect to a SQLite database:
milvus_uri = "./milvus_vanna.db" milvus_client = MilvusClient(uri=milvus_uri) vn_milvus = VannaMilvus( config={ "api_key": os.getenv("OPENAI_API_KEY"), "model": "gpt-3.5-turbo", "milvus_client": milvus_client, "embedding_function": model.DefaultEmbeddingFunction(), "n_results": 2, } )Create a sample SQLite database with tables and connect Vanna to it:
import sqlite3 sqlite_path = "./my-database.sqlite" sql_connect = sqlite3.connect(sqlite_path) c = sql_connect.cursor() init_sqls = """ CREATE TABLE IF NOT EXISTS Customer ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Name TEXT NOT NULL, Company TEXT NOT NULL, City TEXT NOT NULL, Phone TEXT NOT NULL ); CREATE TABLE IF NOT EXISTS Company ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Name TEXT NOT NULL, Industry TEXT NOT NULL, Location TEXT NOT NULL, EmployeeCount INTEGER NOT NULL ); CREATE TABLE IF NOT EXISTS User ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Username TEXT NOT NULL UNIQUE, Email TEXT NOT NULL UNIQUE ); INSERT INTO Customer (Name, Company, City, Phone) VALUES ('John Doe', 'ABC Corp', 'New York', '123-456-7890'); INSERT INTO Customer (Name, Company, City, Phone) VALUES ('Jane Smith', 'XYZ Inc', 'Los Angeles', '098-765-4321'); INSERT INTO Company (Name, Industry, Location, EmployeeCount) VALUES ('ABC Corp', 'cutting-edge technology', 'New York', 100); INSERT INTO User (Username, Email) VALUES ('johndoe123', 'johndoe123@example.com'); """ for sql in init_sqls.split(";"): c.execute(sql) sql_connect.commit() vn_milvus.connect_to_sqlite(sqlite_path)4. Train with Data
Train the model on the DDL data, documentation, and SQL queries:
existing_training_data = vn_milvus.get_training_data() if len(existing_training_data) > 0: for _, training_data in existing_training_data.iterrows(): vn_milvus.remove_training_data(training_data["id"]) df_ddl = vn_milvus.run_sql("SELECT type, sql FROM sqlite_master WHERE sql is not null") for ddl in df_ddl["sql"].to_list(): vn_milvus.train(ddl=ddl) vn_milvus.train( documentation="ABC Corp specializes in cutting-edge technology solutions and innovation." ) vn_milvus.train( documentation="XYZ Inc is a global leader in manufacturing and supply chain management." ) vn_milvus.train(sql="SELECT * FROM Customer WHERE Name = 'John Doe'")5. Generate and Execute SQL Queries
Ask natural language questions and get SQL queries:
sql = vn_milvus.generate_sql("what is the phone number of John Doe?") vn_milvus.run_sql(sql)Try a more complex question that requires documentation context:
sql = vn_milvus.generate_sql("which customer works for a manufacturing corporation?") vn_milvus.run_sql(sql)6. Clean Up
Disconnect and remove resources:
sql_connect.close() milvus_client.close() os.remove(sqlite_path) if os.path.exists(milvus_uri): os.remove(milvus_uri)Learn More
- Write SQL with Vanna and Milvus — Official Milvus tutorial for SQL generation with Vanna
- Vanna Official Website — Official Vanna website
- Vanna GitHub Repository — Vanna source code and community resources
- How to Build an AI Chatbot with Milvus and Towhee — Zilliz blog on building AI chatbots
- Milvus Embeddings Documentation — Milvus documentation on embedding functions