Getting Started with Pgvector: A Guide for Developers Exploring Vector Databases

With recent trends in technology about generative AI and large-language models, more efficient ways to store and query data have emerged. Traditional databases like MySQL and PostgreSQL were the ideal choice for developers for years. However, lately, a new type of database called a vector database has gained widespread popularity in the community.
Vector databases are much different than traditional databases, especially regarding their use cases. So, in this roundup, we’ll explore Pgvector, an extension of PostgreSQL that allows you to use vector storage with a PostgreSQL database. We’ll also explore and understand the limitations of Pgvector, how it differs from specialized vector databases, and how and where you can use vector databases for your applications.
If you are a developer familiar with traditional databases and seeking new solutions, this guide will provide the knowledge you need to get started with Pgvector. It will also help you explore other dedicated vector databases as an alternative to Pgvector.
What is Pgvector?
AI and machine learning are now widely used across various industries, including technology, IT, medicine, and automobiles. In these fields, data is represented as vectors containing numerical values that capture properties and features of unstructured data such as images or texts. Machine learning algorithms use these vectors as input to learn patterns and relationships within the data.
Pgvector is an open-source extension of PostgreSQL, a traditional open-source relational database. It supports storing and searching vectors from natural language processing or deep learning models on top of PostgreSQL. Or, you can simply see Pgvector as a vector database extension of PostgreSQL.
One of the best things about working with Pgvector is that it feels similar to working with a traditional SQL database like PostgreSQL. The syntax for vector operations, such as creating vector columns, creating a new table with vector columns, and getting the nearest L2 neighbors, is the same.
Whether you are working on AI applications, recommendation systems, or any other project that involves high-dimensional data, understanding Pgvector and other vector databases can broaden your horizons in database management. It also allows for efficient storage of vector values without requiring extensive vector storage and database knowledge.
Setting up Pgvector
Now, let’s get started by first setting up Pgvector to integrate it with PostgreSQL. Make sure you have PostgreSQL installed in your system. For Mac, you can easily install it via Homebrew:
brew install postgresql
You can quickly check if you have PostgreSQL installed by running the following command:
psql --version
Then, it should give you the version of PostgreSQL installed on your system as shown below:
You’ll also need to install make
. You can easily install it using Homebrew by running the following command:
brew install make
Then, it should install make
in your system as shown below:
Before you dive into the world of vector databases, you’ll need to set up Pgvector and integrate it with PostgreSQL. Let’s walk through the necessary steps.
- Clone the Pgevector repo.
cd /tmp && git clone --branch v0.4.4 https://github.com/pgvector/pgvector.git
- Head into this directory and run the following
make
commands:
cd pgvector && make && make install
Integrating Pgvector with Postgres
Open the PostgreSQL command-line interface (psql) using the following command. This step will start Postgres on your command line so you can run Postgres commands directly on your terminal:
psql
You can create a user to use Postgres using the following command:
CREATE USER <user> WITH PASSWORD <password>
Then, log in to that user using the credentials you created in the above command. Or, you can also log in to Postgres as a superuser:
psql -U postgres
Now, let’s create a new database to work with the following command:
create database vectordb;
Let’s select this database:
/c vectordb;
Then, we’ll enable the Pgvector extension for our vectordb
database:
create extension pgvector;
You only need to perform this step once for every database you want to use with Pgvector.
Using Pgvector
Let’s create two vector columns, id
and embedding
, in a table called vectors
. The table and columns store the vector data in PostgreSQL.
CREATE TABLE vectors (
id SERIAL PRIMARY KEY,
embedding float4[] -- The vector column
);
We can now insert some vector data into our vectors
table:
INSERT INTO vectors (embedding) VALUES
('{1.2, 0.8, -2.1}'),
('{-0.7, 2.4, 3.6}');
To view the table, we can simply run the SELECT *
query on our vectors
table.
SELECT * FROM vectors;
Using Pgvector for similarity searching
We’ve discussed how vector databases can be beneficial for performing similarity searches. Here’s how to write a simple similarity search query to find vectors similar to a given query vector.
SELECT * FROM vectors
WHERE pgvector_cosine(embedding, '{0.9, -0.3, 1.8}') > 0.8;
We use the regular SELECT *
query with a WHERE
clause. Then, we use the pgvector_cosine
function to specify that we want to retrieve rows where the cosine similarity between the embedding
vector column and the given query vector {0.9, -0.3, 1.8} is greater than 0.8.
Pgvector limitations
While Pgvector is a great way to store and search vectors, it has some obvious disadvantages.
Pgvector has scalability issues when it comes to dealing with high-dimensional vectors. Storing vector data may also introduce additional storage and indexing overheads. It is also important to consider the space your vector data needs and how it could affect query performance.
Moreover, Pgvector only supports one type of index called IVFFlat. This limitation affects the properties of the vectors as well as the size of the datasets you store. It also means that there is no default storage optimization.
In summary, Pgvector is a PostgreSQL extension that enables the storage and search of vector embeddings. However, it has limited abilities and performance. Fortunately, many dedicated vector databases like Milvus are available today that do a far better job due to improved indices or algorithms.
Dedicated vector databases
Now that we’ve explored Pgvector and its applications and disadvantages, let’s introduce the concept of dedicated vector databases.
Unlike Pgvector, a vector search plugin on top of a traditional database, dedicated vector databases like Milvus and Zilliz are purpose-built from the ground up for storing and querying millions or even billions of high-dimensional vector data with almost real-time responses. They leverage advanced indexing techniques to handle similarity searches efficiently, offer superior performance for similarity-based operations, handle large-scale vector data, and provide powerful APIs for AI and machine learning applications.
Introduction to vector databases like Milvus/Zilliz
A dedicated vector database like Milvus caters to a wide range of use cases, including image and video similarity retrieval, natural language processing, recommendation systems, and more. Its versatility makes it suitable for diverse AI-related projects. Let’s understand Milvus and how to leverage it in the cloud using Zilliz Cloud.
Milvus: an open-source vector database
Milvus is an open-source vector database that provides a robust solution for managing and querying billions of high-dimensional vectors. It offers numerous exciting features such as GPU index, Arm64, range search, upsert, and CDC, which ensure optimal performance and user experience for building AI and machine learning applications. Check out the latest Milvus 2.3 release blog for more information on these features.
Zilliz Cloud: a fully managed service enabling Milvus instances in the cloud
Zilliz Cloud operates as a cloud-based service that brings Milvus instances into the realm of software as a service (SaaS). It simplifies the deployment and management of Milvus databases by offering cloud infrastructure, scalability, and operational support. Zilliz ensures that developers can harness the capabilities of Milvus without the complexity of setting up and maintaining their infrastructure. It is just like using Amazon RDS for PostgreSQL in the cloud.
Zilliz Cloud offers a free tier, giving every developer equal access to this vector database service. The free tier offers up to two collections, each accommodating up to 500,000 vectors with 768 dimensions and even more on a smaller scale. This generous capacity allows for significant data handling capabilities without requiring infrastructure investments.
How to choose between Milvus and Zilliz
If you wish complete control over your database, you can opt for self-hosted Milvus instances. However, you must deploy and manage your infrastructure according to your needs and use cases.
On the other hand, if you prefer using a cloud-based vector database, you can use Zilliz Cloud. Zilliz lets you focus on building your application by leveraging Milvus in the cloud without worrying about maintaining the infrastructure.
Milvus and Zilliz empower developers with efficient vector data management, but they cater to diverse deployment preferences. Whether you lean toward self-hosted flexibility or cloud-based simplicity, the Milvus-Zilliz collaboration provides options aligned with your project’s demands.
Vector databases vs. Pgvector
Now, let’s compare Pgvector with Milvus/Zilliz regarding ease of use, performance, and flexibility.
Ease of use
Pgvector seamlessly integrates with PostgreSQL, which is familiar to developers who already use the relational database. However, Milvus and Zilliz require additional setups to install their SDKs and APIs. The good news is that once you set up Milvus/Zilliz, creating a large-scale similarity search service takes less than a minute. In addition, simple and intuitive SDKs are available for various programming languages.
Apart from installation, there is a bit of a learning curve when using them. Pgvector seems easier to use because of its familiarity with PostgreSQL.
Performance and scalability analysis
One major limitation of Pgvector is its limited indexing capabilities. For complex similarity searches, Milvus outperforms Pgvector due to its optimized indexing mechanisms, as demonstrated in this benchmark.
Pgvector and Milvus are powerful vector search stacks engineered to handle large-scale vector data efficiently. However, Milvus/Zilliz is more scalable and can handle datasets with billions of vector embeddings.
Feature sets and flexibility
While Pgvector brings vector capabilities to PostgreSQL, Milvus/Zilliz is a purpose-built vector database with specialized features tailored for AI applications. They’re more feature-rich and can be more helpful for custom use cases.
Benchmarking Pgvector and Milvus/Zilliz
VectorDBBench is an open-source benchmarking tool for vector databases. It compares mainstream vector databases and cloud services available in the market and provides unbiased benchmark results regarding queries per second (QPS), queries per dollar (QP$), and P99 latency.
For example, you can leverage VectorDBBench to benchmark Pgvector and Milvus/Zilliz. According to the benchmarking results, Milvus and Zilliz outperform Pgvector regarding QPS and latency.
Note: This is a 1-100 score based on each system's performance in different cases according to a specific rule. A higher score denotes better performance.
Note: This is a >1 score based on each system's performance in different cases according to a specific rule. A lower score denotes better performance.
With VectorDBBench, you can quickly understand which database performs better in terms of various metrics. You can also determine which database best suits your specific needs.
Conclusion
Pgvector opens up new possibilities for storing and querying vector data within PostgreSQL. If you’re already familiar with PostgreSQL and want to explore vector databases, Pgvector is an excellent starting point. However, for AI applications with millions or even billions of vector similarity searches, Milvus and Zilliz offer specialized capabilities and optimal performance. Consider your project’s requirements and explore these vector databases to unlock the full potential of vector storage in your applications.
This post was written by Siddhant Varma. Siddhant is a full-stack JavaScript developer with expertise in front-end engineering. He’s worked with scaling multiple startups in India and has experience building products in the Ed-Tech and healthcare industries. Siddhant has a passion for teaching and a knack for writing. He's also taught programming to many graduates, helping them become better future developers.
- What is Pgvector?
- Setting up Pgvector
- Integrating Pgvector with Postgres
- Using Pgvector
- Using Pgvector for similarity searching
- Pgvector limitations
- Dedicated vector databases
- Introduction to vector databases like Milvus/Zilliz
- Vector databases vs. Pgvector
- Benchmarking Pgvector and Milvus/Zilliz
- Conclusion