Beyond PGVector: When Your Vector Database Needs a Formula 1 Upgrade
Postgres, a tenor of the relational database world, has served developers faithfully for more than 28 years. With the introduction of its pgvector extension, Postgres has taken steps to support vector embeddings, offering a convenient entry point for basic vector similarity search.
However, while pgvector provides a practical starting point, it still falls short compared to purpose-built vector databases like Milvus, especially when handling large-scale applications and complex search requirements. Relying solely on Postgres with pgvector for demanding vector search workloads is like trying to enter a Formula 1 race with a souped-up family sedan—it’s a step up, but it’s simply not built for that level of competition.
As AI applications explode in popularity, developers are encountering growing pains. What starts as a convenient solution with pgvector quickly becomes a frustrating bottleneck as data grows and search requirements become more sophisticated. Search quality declines, index updates drag on, and frustration rises as you struggle to meet your application's demands.
This blog explores why Postgres, with its vector search add-on, pgvector, works well for smaller projects and simpler use cases but reaches its limits for large-scale vector search. We’ll also discuss why purpose-built vector databases like Milvus are indispensable for tackling the unique challenges of this rapidly advancing field.
The Postgres and Pgvector Bottleneck
You can see Postgres as a Sedan; it has been here for years and works, but it will not allow you to be extremely fast. While pgvector
adds vector storage and basic similarity search capabilities to Postgres, it inherits fundamental limitations:
- Performance at Scale: pgvector supports only two indexing methods: HNSW and IVF_FLAT. While HNSW is a popular algorithm, it comes with significant trade-offs, including long indexing times and higher memory requirements. On the other hand, IVF_FLAT offers faster index building but struggles to maintain query performance as the dataset scales. The lack of support for on-disk indexes like DiskANN or GPU-based index types further limits its performance and flexibility when dealing with large-scale datasets.
- High Dimensional Embeddings: Pgvector cannot handle high-dimensional vector embeddings due to architectural constraints. It relies on fixed 8KB pages for data storage, fundamentally restricting the number of dimensions a vector can accommodate. Since each dimension requires 4 bytes for storing a float and metadata also occupies space, indexing high-dimensional vectors effectively becomes impossible. In contrast, purpose-built databases like Milvus are designed to handle high-dimensional embeddings easily. While there are workarounds in pgvector like quantization exist, they often require compromising on precision.
- Lack of Advanced Features: pgvector lacks the comprehensive feature set provided by purpose-built vector databases. For example, Milvus supports advanced metadata filtering search, a broader range of distance metrics beyond L2 and inner product, hybrid sparse and dense search, and even full-text search (available in Milvus 2.5).
- Scalability Challenges: Scaling pgvector to handle large datasets and high query loads is non-trivial. It often requires substantial effort to implement sharding and manage indexes across multiple nodes, introducing additional complexity and operational overhead. Purpose-built vector databases are designed with scalability in mind, offering seamless performance even as datasets and query demands grow.
Milvus: The Formula 1
Milvus is an open-source vector database engineered from the ground up to address the specific demands of vector similarity search at scale. Think of it as a Formula 1 car, meticulously designed for speed and performance in the high-stakes world of vector data.
Here's how Milvus outperforms Postgres with pgvector
:
- Blazing Fast Search: Milvus supports 11 state-of-the-art indexing algorithms, including FLAT, HNSW, DiskANN, CAGRA, and GPU acceleration, to deliver unmatched search performance, even with 10s of billions of vectors.
- Effortless Scalability: Milvus has a distributed and Kubernetes-native architecture. It enables seamless horizontal scaling, allowing you to handle massive datasets and high query throughput without the complexities of manual sharding.
- Comprehensive Feature Set: Milvus offers a comprehensive suite of features, including metadata filtering, support for various distance metrics, full-text search, hybrid search, and flexible indexing options to tailor your search strategy to your specific needs.
- Optimized for the Future of Data: Milvus is designed to handle the scale and complexity of the ever-growing volume of unstructured data represented as vectors, making it the ideal solution for the next generation of AI applications.
- Continuous Innovation: Just like a Formula 1 team constantly pushes the boundaries of performance, Milvus is continually evolving with cutting-edge indexing algorithms, hardware acceleration support, and machine learning-driven optimizations.
Making the Right Choice: When to Use What
While Postgres with pgvector might not be a Formula 1 car, it still has its place in the garage. Let's explore when to use each solution:
Choose pgvector when:
- You're building a proof of concept or MVP with small to medium datasets.
- Your vector search needs are simple and don't require complex filtering.
- Your embedding models produce vectors with dimensions under the Postgres page size limits.
- You need ACID compliance and strong transactional guarantees.
Choose Milvus when:
- You're working with large-scale datasets (millions to billions of vectors).
- You need high-dimensional embeddings beyond pgvector's limitations.
- Query performance is critical to your application.
- You require advanced features like diverse indexing options or GPU acceleration.
- You anticipate rapid growth and need a solution that scales horizontally.
Moving Your Vectors to Milvus with Our Migration Service
If you are using PGVector and are encountering issues, we offer an open-source migration tool called VTS (short for Vector Transport Service) to help you move your vectors and unstructured data to Milvus or its managed service on Zilliz Cloud.
Built on top of Apache Seatunnel, VTS offers:
- Rich, extensible connectors
- Unified stream and batch processing for real-time synchronization and offline batch imports
- Distributed snapshot support for data consistency
- High performance, low latency, and scalability
- Real-time monitoring and visual management
In addition to pgvector, VTS supports migrating vector data from various sources, including Elasticsearch, Pinecone, Qdrant, and Tencent Cloud VDB, to purpose-built vector databases like Milvus. It also enables seamless vector migration between open-source Milvus and Zilliz Cloud, both ways.
To simplify the migration process, VTS automatically handles schema conversion, eliminating the need for complex setup and development efforts. In 2025, VTS will expand its capabilities to support data migration from additional sources like MongoDB and Weaviate. Future versions will also introduce the ability to generate vector embeddings on the fly, allowing unstructured data to be easily converted and ported to vector databases for accelerated approximate nearest neighbor (ANN) search. Stay tuned for these exciting updates!
How VTS works
The Road Ahead
The landscape of vector databases continues to evolve alongside the rapid advancement of AI technologies. While pgvector
provides a convenient entry point, the demands of production-scale AI applications often necessitate purpose-built solutions.
The choice between pgvector and Milvus represents more than just a technical decision. It's a strategic investment in your application's future scalability. Just as a Formula 1 team selects their equipment based on performance requirements, organizations must evaluate their vector search needs against their growth trajectory.
With tools like VTS streamlining the migration process, companies can confidently transition their vector search capabilities when their requirements outgrow pgvector's capabilities. Whether architecting new applications or scaling existing ones, early consideration of vector search requirements can prevent technical debt and ensure sustainable growth.
We'd Love to Hear What You Think!
If you like this blog post, please consider:
- ⭐ Giving us a star on GitHub
- 💬 Joining our Milvus Discord community to share your experiences or if you need help to move from pgvector
- 🔍 Exploring our Bootcamp repository for examples of applications using Milvus
- The Postgres and Pgvector Bottleneck
- Milvus: The Formula 1
- Making the Right Choice: When to Use What
- Moving Your Vectors to Milvus with Our Migration Service
- The Road Ahead
- We'd Love to Hear What You Think!
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading
- Read Now
Garbage In, Garbage Out: Why Poor Data Curation Is Killing Your AI Models
Encord highlighted the importance of data quality and market trends, presenting a roadmap to help organizations establish high-quality data production pipelines.
- Read Now
Enhancing Your RAG with Knowledge Graphs Using KnowHow
Knowledge Graphs (KGs) store and link data based on their relationships. KG-enhanced RAG can significantly improve retrieval capabilities and answer quality.
- Read Now
How Testcontainers Streamlines the Development of AI-Powered Applications
In this article, we explore the concept of containerization and one of its essential tools, Docker, and how they decrease the complexity of the application development process.