SingleStore vs pgvector Choosing the Right Vector Database for Your AI Apps
What is a Vector Database?
Before we compare SingleStore and pgvector, let's first explore the concept of vector databases.
A vector database is specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. These vectors encode complex information, such as the semantic meaning of text, the visual features of images, or product attributes. By enabling efficient similarity searches, vector databases play a pivotal role in AI applications, allowing for more advanced data analysis and retrieval.
Common use cases for vector databases include e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP) tasks. They also play a crucial role in Retrieval Augmented Generation (RAG), a technique that enhances the performance of large language models (LLMs) by providing external knowledge to reduce issues like AI hallucinations.
There are many types of vector databases available in the market, including:
- Purpose-built vector databases such as Milvus, Zilliz Cloud (fully managed Milvus)
- Vector search libraries such as Faiss and Annoy.
- Lightweight vector databases such as Chroma and Milvus Lite.
- Traditional databases with vector search add-ons capable of performing small-scale vector searches.
SingleStore is a distributed, relational, SQL database management system and pgvector is a traditional database. Both with vector search as an add-on. This post compares their vector search capabilities.
SingleStore: Overview and Core Technology
SingleStore has made vector search possible by putting it in the database itself, so you don’t need separate vector databases in your tech stack. Vectors can be stored in regular database tables and searched with standard SQL queries. For example, you can search similar product images while filtering by price range or explore document embeddings while limiting results to specific departments. The system supports both semantic search using FLAT, IVF_FLAT, IVF_PQ, IVF_PQFS, HNSW_FLAT, and HNSW_PQ for vector index and dot product and Euclidean distance for similarity matching. This is super useful for applications like recommendation systems, image recognition and AI chatbots where similarity matching is fast.
At its core SingleStore is built for performance and scale. The database distributes the data across multiple nodes so you can handle large scale vector data operations. As your data grows you can just add more nodes and you’re good to go. The query processor can combine vector search with SQL operations so you don’t need to make multiple separate queries. Unlike vector only databases SingleStore gives you these capabilities as part of a full database so you can build AI features without managing multiple systems or dealing with complex data transfers.
For vector indexing SingleStore has two options. The first is exact k-nearest neighbors (kNN) search which finds the exact set of k nearest neighbors for a query vector. But for very large datasets or high concurrency SingleStore also supports Approximate Nearest Neighbor (ANN) search using vector indexing. ANN search can find k near neighbors much faster than exact kNN search sometimes by orders of magnitude. There’s a trade off between speed and accuracy - ANN is faster but may not return the exact set of k nearest neighbors. For applications with billions of vectors that need interactive response times and don’t need absolute precision ANN search is the way to go.
The technical implementation of vector indices in SingleStore has specific requirements. These indices can only be created on columnstore tables and must be created on a single column that stores the vector data. The system currently supports Vector Type(dimensions[, F32]) format, F32 is the only supported element type. This structured approach makes SingleStore great for applications like semantic search using vectors from large language models, retrieval-augmented generation (RAG) for focused text generation and image matching based on vector embeddings. By combining these with traditional database features SingleStore allows developers to build complex AI applications using SQL syntax while maintaining performance and scale.
pgvector: Overview and Core
pgvector is a PostgreSQL extension that allows you to do vector operations directly in your PostgreSQL database. This means you can store and query vector embeddings without a separate vector database.
pgvector has full vector operation capabilities: native vector similarity search, exact and approximate nearest neighbor search, and integration with PostgreSQL's indexing. It supports vector arithmetic: addition and subtraction, and multiple distance metrics: Euclidean, cosine, inner product.
Search Mechanisms & Index types
By default, pgvector uses exact nearest neighbor search, which gives perfect recall but can be slow with large datasets. For better performance, pgvector offers approximate nearest neighbor search through indexing, which trades some accuracy for much better speed.
HNSW (Hierarchical Navigable Small World): Introduced in pgvector 0.5.0, HNSW creates a multi-layer graph structure for fast search traversal. It's known for great performance and good results, but requires more memory than IVFFlat. This index is suitable for applications that need fast and accurate search.
IVFFlat (Inverted File Flat): The IVFFlat method clusters vectors in the vector space and uses a two-step search process. First, it finds relevant clusters, then performs an exact search within selected clusters. It's more memory efficient than HNSW but can be slightly slower or less accurate in some cases.
Technical Limitations
A technical limitation of pgvector is its dimensional limit. With a default page size of 8 KiB, the extension can store full precision (32-bit/4 bytes) vector data up to 2000 dimensions, as this uses 7.8125 KiB per vector. With scalar quantization (halfvec/16-bit/2 bytes), the max dimensions increase to 4000, still using 7.8125 KiB per vector.
Impact on Modern Language Models
This limits RAG (Retrieval-Augmented Generation) applications. Most top-performing embedding models on HuggingFace's MTEB leaderboard exceed these dimensional limits. Even with halfvec scalar quantization, only three models are compatible: gte-qwen2-7B-instruct, gte-qwen2-7B-instruct-fp16, bge-multilingual-gemma2.
Implementation Tips
When using pgvector, you should experiment with both HNSW and IVFFlat indexes to find the best one for your use case. Your decision will depend on several factors: dataset size, query speed requirements, acceptable accuracy trade-offs, memory constraints. Fine tune index parameters and benchmark different configurations to find the sweet spot for your use case.
Performance
When using pgvector, keep in mind that adding approximate indexes will change the query results, not like traditional database indexes. This is something to consider during development and testing phase to make sure the accuracy-performance trade-off fits your application needs. Monitor and adjust your configuration as your data and usage pattern changes.
Key Differences
Search Methodology
SingleStore: SingleStore has both exact and approximate nearest neighbor (ANN) searches. Its vector indexing supports FLAT, IVF_FLAT, IVF_PQ, HNSW_FLAT, and HNSW_PQ. This gives you high performance similarity searches with dot product or Euclidean distance. ANN is great for large datasets with low latency where you can tolerate some accuracy tradeoffs.
pgvector: pgvector has native vector operations in PostgreSQL, including exact and ANN searches. It uses HNSW and IVFFlat for ANN, HNSW is faster but more memory hungry and IVFFlat balances memory and speed. While pgvector is very flexible, its default exact search will struggle with large datasets unless you optimize with these indexes.
Data Handling
SingleStore: SingleStore puts vector data into columnstore tables so you can query structured and unstructured data seamlessly. Its SQL driven approach combines vector search with standard database queries so it’s great for hybrid use cases like searching product embeddings filtered by price or category.
pgvector: As an extension of PostgreSQL, pgvector is tightly coupled with relational data handling. It allows you to store vector embeddings alongside traditional relational data so it’s easy to design your schema for applications that need both types of data. However, vector dimensional caps (2000-4000 depending on precision) may limit some modern LLM applications.
Scalability and Performance
SingleStore: SingleStore scales horizontally by distributing data across nodes, performance remains the same as data grows. Its distributed architecture and query processor can do vector and SQL operations in parallel, reduces query overhead. ANN indexing makes queries faster for large datasets.
pgvector: Scalability in pgvector relies on PostgreSQL’s strengths. It can handle moderate datasets well but may struggle with large datasets or high concurrency workloads. Index tuning and clustering can help but horizontal scaling may require additional workarounds like partitioning.
Flexibility and Customization
SingleStore: SingleStore is simple, you can do vector search with standard SQL. While this makes implementation easy, the vector indexing options are limited to specific configurations like columnstore tables which may limit flexibility for custom setups.
pgvector: pgvector is more flexible, supports vector arithmetic and multiple similarity metrics (Euclidean, cosine, inner product). It’s better for developers who want to experiment with custom indexing, fine tune parameters or integrate with PostgreSQL’s ecosystem.
Integration and Ecosystem
SingleStore: As a standalone database, SingleStore is all in one, reduces the need for separate systems. This all in one approach minimizes integration complexity but may lack the ecosystem of PostgreSQL based tools.
pgvector: pgvector benefits from PostgreSQL’s ecosystem, including compatibility with popular frameworks, tools and extensions. It’s a strong choice if your stack is already built on top of PostgreSQL.
Ease of Use
SingleStore: The SQL first design makes setup and querying easy, great for teams that want to deploy fast and minimal learning curve. But adapting to its vector indexing constraints may require some adjustments.
pgvector: Developers familiar with PostgreSQL will find pgvector easy. Index experimentation and tuning adds some complexity but also opportunities for optimization specific to your use case.
Cost
SingleStore: As a high performance enterprise grade database SingleStore may have higher operational costs especially for managed services or large scale deployments. Consolidating systems may offset costs for organizations with diverse data needs.
pgvector: pgvector’s open source nature makes it cost effective for smaller projects. But managing PostgreSQL infrastructure at scale can introduce hidden costs like additional hardware or maintenance.
Security
SingleStore: SingleStore has enterprise grade security features like data encryption, role based access control and audit logs. These are for use cases with high compliance requirements.
pgvector: pgvector inherits PostgreSQL's security features.
When to use SingleStore
SingleStore is for large distributed data systems that need high performance and scale. It can combine vector search with SQL queries to bring together structured and unstructured data in applications like AI powered recommendation systems, product search with filters and semantic search for enterprise workloads. SingleStore’s distributed architecture, ANN indexing options and all in one design makes it perfect for scenarios where you need to handle billions of vectors with interactive response times.
When to use pgvector
pgvector is for environments already on PostgreSQL or where simplicity and cost matters. It’s for smaller scale vector search applications or projects that need to combine full text search, traditional relational queries and vector operations in the same database. It’s flexible with distance metrics, indexing options and integrates well with PostgreSQL’s rich ecosystem for developers experimenting with embedding models or adding vector search to existing PostgreSQL infrastructure.
Conclusion
Both SingleStore and pgvector have their own strengths in vector search. SingleStore is great for large scale distributed datasets with SQL integration and high performance, while pgvector is great for flexibility, ease of use with PostgreSQL and cost effective. The choice depends on your use case – do you need enterprise grade scalability or a lightweight solution within an existing PostgreSQL environment. By evaluating your data types, performance needs and ecosystem requirements, you can choose the tool that fits your project.
Read this to get an overview of SingleStore and pgvector but to evaluate these you need to evaluate based on your use case. One tool that can help with that is VectorDBBench, an open-source benchmarking tool for vector database comparison. In the end, thorough benchmarking with your own datasets and query patterns will be key to making a decision between these two powerful but different approaches to vector search in distributed database systems.
Using Open-source VectorDBBench to Evaluate and Compare Vector Databases on Your Own
VectorDBBench is an open-source benchmarking tool for users who need high-performance data storage and retrieval systems, especially vector databases. This tool allows users to test and compare different vector database systems like Milvus and Zilliz Cloud (the managed Milvus) using their own datasets and find the one that fits their use cases. With VectorDBBench, users can make decisions based on actual vector database performance rather than marketing claims or hearsay.
VectorDBBench is written in Python and licensed under the MIT open-source license, meaning anyone can freely use, modify, and distribute it. The tool is actively maintained by a community of developers committed to improving its features and performance.
Download VectorDBBench from its GitHub repository to reproduce our benchmark results or obtain performance results on your own datasets.
Take a quick look at the performance of mainstream vector databases on the VectorDBBench Leaderboard.
Read the following blogs to learn more about vector database evaluation.
Further Resources about VectorDB, GenAI, and ML
- What is a Vector Database?
- SingleStore: Overview and Core Technology
- Search Mechanisms & Index types
- Technical Limitations
- Implementation Tips
- Performance
- Key Differences
- Using Open-source VectorDBBench to Evaluate and Compare Vector Databases on Your Own
- Further Resources about VectorDB, GenAI, and ML
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading
- Read Now
Catch a Cute Ghost this Halloween with Milvus
Run ghastly multimodal analytics and Retrieval Augmented Generation with our "ghosts" collections in the open-source Milvus vector database.
- Read Now
Setting up Milvus on Amazon EKS
This blog provides step-by-step guidance on deploying a Milvus cluster using EKS and other services.
- Read Now
New for Zilliz Cloud: Migration Service, Fivetran Connector, Multi-replica, and More
We're excited to announce new features in Zilliz Cloud designed to enhance support for running AI workloads in production environments.
The Definitive Guide to Choosing a Vector Database
Overwhelmed by all the options? Learn key features to look for & how to evaluate with your own data. Choose with confidence.