MongoDB vs MyScale: Selecting the Right Database for GenAI Applications
As AI-driven applications evolve, the importance of vector search capabilities in supporting these advancements cannot be overstated. This blog post will discuss two prominent databases with vector search capabilities: MongoDB and MyScale. Each provides robust capabilities for handling vector search, an essential feature for applications such as recommendation engines, image retrieval, and semantic search. Our goal is to provide developers and engineers with a clear comparison, aiding in the decision of which database best aligns with their specific requirements.
What is a Vector Database?
Before we compare MongoDB vs MyScale, let's first explore the concept of vector databases.
A vector database is specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. These vectors encode complex information, such as the semantic meaning of text, the visual features of images, or product attributes. By enabling efficient similarity searches, vector databases play a pivotal role in AI applications, allowing for more advanced data analysis and retrieval.
Common use cases for vector databases include e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP) tasks. They also play a crucial role in Retrieval Augmented Generation (RAG), a technique that enhances the performance of large language models (LLMs) by providing external knowledge to reduce issues like AI hallucinations.
There are many types of vector databases available in the market, including:
- Purpose-built vector databases such as Milvus, Zilliz Cloud (fully managed Milvus)
- Vector search libraries such as Faiss and Annoy.
- Lightweight vector databases such as Chroma and Milvus Lite.
- Traditional databases with vector search add-ons capable of performing small-scale vector searches.
MongoDB is a NoSQL database with vector search as an add-on and MyScale is a database built on ClickHouse that combines vector search and SQL analytics. This post compares their vector search capabilities.
MongoDB: The Basics
MongoDB Atlas Vector Search is a feature that allows you to do vector similarity searches on data stored in MongoDB Atlas. You can index and query high-dimensional vector embeddings along with your document data and do AI and machine learning right in the database.
At its core, Atlas Vector Search uses the Hierarchical Navigable Small World (HNSW) algorithm for indexing and searching vector data. This creates a multi-level graph of the vector space so you can do Approximate Nearest Neighbor (ANN) searches. It’s a balance of speed and accuracy for large scale vector search. Atlas Vector Search also supports Exact Nearest Neighbors (ENN) searches which prioritizes accuracy over performance for queries of up to 10,000 documents.
One of the big advantages of Atlas Vector Search is its integration with MongoDB’s flexible document model. You can store vector embeddings along with other document data so you can search more contextually and precisely. You can query any kind of data that can be embedded up to 4096 dimensions. Atlas Vector Search allows you to combine vector similarity searches with traditional document filtering. For example, a semantic search for products could be filtered by category, price range or availability.
Atlas Vector Search also supports hybrid search, combining vector search with full text search for more granular results. This is different from Atlas Search which is focused on keyword based search. The platform integrates with popular AI services and tools so you can use it with embedding models from providers like OpenAI, VoyageAI and many others listed on Hugging Face. It also supports open-source frameworks like LangChain and LlamaIndex for building applications that use Large Language Models (LLMs).
To ensure scalability and performance, MongoDB Atlas provides Search Nodes, which provides dedicated infrastructure for Atlas Search and Vector Search workloads. This allows you to have optimized compute resources and independent scaling of search needs so you get better performance at scale.
By having these capabilities in the MongoDB ecosystem, Atlas Vector Search is a full solution for developers building AI powered applications, recommendation systems or advanced search features. No need for a separate vector database, you can use MongoDB’s scalability and rich features along with vector search.
What is MyScale? The Basics
MyScale is a cloud based database built on top of the open source ClickHouse database, designed for AI and machine learning workloads. It can handle structured and vector data and real time analytics and machine learning. MyScale is focused on time series, vector search and full text search so it’s good for real time processing and AI driven insights. By using ClickHouse architecture, MyScale is high performance and scalable for AI.
One of the key features of MyScale is native SQL support which simplifies AI driven queries by integrating vector search, full text search and traditional SQL queries in one system. This reduces the need for multiple tools and makes it scalable for AI. MyScale supports and manages analytical processing of both structured and vectorized data on one platform using OLAP database architecture to operate on vectorized data. Developers can interact with MyScale using SQL so it’s accessible to all programmers familiar with relational databases.
MyScale has multiple vector index types and similarity metrics to support different use cases. It supports common distance metrics like Euclidean distance (L2), inner product (IP) and cosine similarity. The database has multiple indexing algorithms: MSTG (Multi-Scale Tree Graph), ScaNN, IVFFLAT, IVFPQ, IVFSQ and HNSW, each with its own set of parameters to tune. MyScale’s proprietary MSTG vector engine uses NVMe SSDs to increase data density so it outperforms specialized vector databases in both performance and cost.
By combining the functionality of an SQL database, vector database and full text search engine into one system MyScale reduces infrastructure and maintenance costs. This unification allows for joint data queries and analytics and a single data foundation for AI applications. MyScale also has MyScale Telemetry for full observability of LLM systems so you can monitor and debug efficiently. As data gets more complex MyScale is a future proof solution that can handle newer data modalities and database sizes while keeping computing performance and integration between different data types.
Key Differences
When it comes to vector search tools MongoDB Atlas Vector Search and MyScale are two great options. Let’s compare them so you can make an informed decision for your project.
Search Methodology
MongoDB Atlas Vector Search uses the Hierarchical Navigable Small World (HNSW) algorithm for indexing and searching vector data. This creates a multi level graph of the vector space and allows Approximate Nearest Neighbor (ANN) searches. It also supports Exact Nearest Neighbors (ENN) searches for up to 10,000 documents.
MyScale has multiple vector index types and similarity metrics. It supports common distance metrics like Euclidean distance (L2), inner product (IP), and cosine similarity. MyScale has multiple indexing algorithms: MSTG (Multi-Scale Tree Graph), ScaNN, IVFFLAT, IVFPQ, IVFSQ, and HNSW. This gives you more control over search performance and accuracy.
Data Handling
MongoDB Atlas Vector Search works with MongoDB’s flexible document model. You can store vector embeddings alongside other document data and search more contextually and precisely. You can query any data that can be embedded up to 4096 dimensions and combine vector similarity search with document filtering.
MyScale is built to handle both structured and vector data. It’s designed for real-time analytics and machine learning workloads so it’s good for time series, vector search and full text search. MyScale’s ability to process analytical workloads of both structured and vectorized data on one platform with OLAP database architecture is unique.
Scalability and Performance
MongoDB Atlas has Search Nodes which provides dedicated infrastructure for Atlas Search and Vector Search workloads. This means optimized compute resources and independent scaling of search needs for better performance at scale.
MyScale is built on ClickHouse whose architecture is known for its high performance and scalability. Its proprietary MSTG vector engine uses NVMe SSDs to increase data density and potentially outperforms specialized vector databases in both performance and cost.
Flexibility and Customization
MongoDB Atlas Vector Search allows hybrid search, combining vector search with full text search for more granular results. It also allows combining vector similarity search with document filtering for more flexibility in query construction.
MyScale has native SQL support, so developers can integrate vector search, full text search and traditional SQL queries into one system. This SQL based approach is more familiar and accessible to developers with relational database experience.
Integration and Ecosystem
MongoDB Atlas Vector Search integrates with popular AI services and tools, supports embedding models from OpenAI and VoyageAI. Works with open-source frameworks like LangChain and LlamaIndex for building applications that use Large Language Models (LLMs).
MyScale’s integration is not explicitly mentioned in the docs but its SQL support implies it can integrate with many SQL compatible tools and frameworks.
Ease of Use
MongoDB Atlas Vector Search builds on top of the existing MongoDB ecosystem, so if you are already familiar with MongoDB it will be a gentler learning curve.
MyScale’s SQL for querying vector data might be more accessible to developers with SQL experience but the variety of indexing algorithms and parameters will require more learning and tuning to get the best performance.
Cost
MongoDB has an established ecosystem, lots of documentation and developers are familiar with it. If you’re already using MongoDB, adding vector search might be a no-brainer.
MyScale’s unified approach can reduce infrastructure and maintenance costs by having multiple functionality in one system. Its MSTG vector engine claims to outperform and be more cost effective than specialized vector databases.
When to Choose Each
MongoDB Atlas Vector Search is the better choice when you’re already using MongoDB for your database needs and want to add vector search without introducing another system. It’s perfect for applications that need to integrate vector search with document data seamlessly, like recommendation systems or advanced product search. MongoDB’s solution is great when you need to combine vector similarity search with document filtering so you can ask more contextual questions. Choose MongoDB Atlas Vector Search when you want to use the MongoDB ecosystem, including scalability and integration with popular AI tools.
MyScale is the better choice when you need a single platform that combines SQL database, vector search and full-text search. It’s perfect for applications that need to process both structured and vector data in real-time, like AI-driven analytics or complex time series analysis with vector components. MyScale’s native SQL support makes it a great option for teams with strong SQL skills who want to do vector search with familiar query language. Choose MyScale when you need flexibility in vector indexing algorithms, want to optimize for cost and performance at scale or need a solution that can handle multiple data types and search modalities in one platform.
Conclusion
MongoDB Atlas Vector Search integrates seamlessly with your existing MongoDB deployment, has powerful vector search and can combine vector search with document filtering. MyScale is a single platform for SQL, vector and full-text search with flexible indexing and native SQL support for vector queries. Your choice between these will depend on your existing infrastructure, your use case and your team. Consider factors like integration with document data, SQL-based querying, types of data you’re working with and scalability needs. Both have robust vector search but their strengths fit different scenarios and dev environments.
Read this to get an overview of MongoDB and MyScale but to evaluate these you need to evaluate based on your use case. One tool that can help with that is VectorDBBench, an open-source benchmarking tool for vector database comparison. In the end, thorough benchmarking with your own datasets and query patterns will be key to making a decision between these two powerful but different approaches to vector search in distributed database systems.
Using Open-source VectorDBBench to Evaluate and Compare Vector Databases on Your Own
VectorDBBench is an open-source benchmarking tool for users who need high-performance data storage and retrieval systems, especially vector databases. This tool allows users to test and compare different vector database systems like Milvus and Zilliz Cloud (the managed Milvus) using their own datasets and find the one that fits their use cases. With VectorDBBench, users can make decisions based on actual vector database performance rather than marketing claims or hearsay.
VectorDBBench is written in Python and licensed under the MIT open-source license, meaning anyone can freely use, modify, and distribute it. The tool is actively maintained by a community of developers committed to improving its features and performance.
Download VectorDBBench from its GitHub repository to reproduce our benchmark results or obtain performance results on your own datasets.
Take a quick look at the performance of mainstream vector databases on the VectorDBBench Leaderboard.
Read the following blogs to learn more about vector database evaluation.
Further Resources about VectorDB, GenAI, and ML
- What is a Vector Database?
- MongoDB: The Basics
- What is MyScale? The Basics
- Key Differences
- When to Choose Each
- Conclusion
- Using Open-source VectorDBBench to Evaluate and Compare Vector Databases on Your Own
- Further Resources about VectorDB, GenAI, and ML
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading
- Read Now
Introducing Milvus 2.5: Built-in Full-Text Search, Advanced Query Optimization, and More 🚀
We're thrilled to announce the release of Milvus 2.5, a significant step in our journey to build the world's most complete solution for all search workloads.
- Read Now
Transformers4Rec: Bringing NLP Power to Modern Recommendation Systems
Transformers4Rec is a powerful and flexible library designed for creating sequential and session-based recommendation systems with PyTorch.
- Read Now
Boosting Work Efficiency with Generative AI Use Cases
This blog will explore how Generative AI (GenAI) applications can boost work efficiency.
The Definitive Guide to Choosing a Vector Database
Overwhelmed by all the options? Learn key features to look for & how to evaluate with your own data. Choose with confidence.