TiDB vs MyScale Choosing the Right Vector Database for Your AI Apps
What is a Vector Database?
Before we compare TiDB and MyScale, let's first explore the concept of vector databases.
A vector database is specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. These vectors encode complex information, such as the semantic meaning of text, the visual features of images, or product attributes. By enabling efficient similarity searches, vector databases play a pivotal role in AI applications, allowing for more advanced data analysis and retrieval.
Common use cases for vector databases include e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP) tasks. They also play a crucial role in Retrieval Augmented Generation (RAG), a technique that enhances the performance of large language models (LLMs) by providing external knowledge to reduce issues like AI hallucinations.
There are many types of vector databases available in the market, including:
- Purpose-built vector databases such as Milvus, Zilliz Cloud (fully managed Milvus)
- Vector search libraries such as Faiss and Annoy.
- Lightweight vector databases such as Chroma and Milvus Lite.
- Traditional databases with vector search add-ons capable of performing small-scale vector searches.
TiDB is a traditional database and MyScale is a database built on ClickHouse that combines vector search and SQL analytics. Both with vector search as an add-on. This post compares their vector search capabilities.
TiDB: Overview and Core Technology
TiDB, developed by PingCAP, is an open-source, distributed SQL database that offers hybrid transactional and analytical processing (HTAP) capabilities. It is MySQL-compatible, making it easy to adopt for teams already familiar with the MySQL ecosystem. TiDB's distributed SQL architecture provides horizontal scalability like NoSQL databases while retaining the relational model of SQL databases, making it highly flexible for handling both transactional and analytical workloads.
One of TiDB's core strengths is its HTAP architecture, which allows it to process transactional (OLTP) and analytical (OLAP) workloads in a single database, reducing the need for separate systems. Additionally, TiDB's MySQL compatibility makes it easy to integrate into existing environments that rely on MySQL without significant changes to the application code. The database also features auto-sharding, automatically distributing data across nodes to improve read and write performance while maintaining strong consistency.
TiDB supports vector search through integration with external libraries and plugins, enabling efficient management and querying of vectorized data. This feature, combined with TiDB's HTAP architecture, makes it a versatile option for businesses needing vector search capabilities alongside transactional and analytical workloads. The distributed architecture of TiDB allows it to handle large-scale vector queries once the necessary configurations are in place.
While including vector search functionalities in TiDB requires additional configuration, the system's SQL compatibility allows developers to combine vector search with traditional relational queries. This flexibility makes TiDB suitable for complex applications that require both vector search and relational database capabilities, offering a comprehensive solution for diverse data management needs.
What is MyScale? Overview and Core Technology
MyScale is a cloud based database built on top of the open source ClickHouse database, designed for AI and machine learning workloads. It can handle structured and vector data and real time analytics and machine learning. MyScale is focused on time series, vector search and full text search so it’s good for real time processing and AI driven insights. By using ClickHouse architecture, MyScale is high performance and scalable for AI.
One of the key features of MyScale is native SQL support which simplifies AI driven queries by integrating vector search, full text search and traditional SQL queries in one system. This reduces the need for multiple tools and makes it scalable for AI. MyScale supports and manages analytical processing of both structured and vectorized data on one platform using OLAP database architecture to operate on vectorized data. Developers can interact with MyScale using SQL so it’s accessible to all programmers familiar with relational databases.
MyScale has multiple vector index types and similarity metrics to support different use cases. It supports common distance metrics like Euclidean distance (L2), inner product (IP) and cosine similarity. The database has multiple indexing algorithms: MSTG (Multi-Scale Tree Graph), ScaNN, IVFFLAT, IVFPQ, IVFSQ and HNSW, each with its own set of parameters to tune. MyScale’s proprietary MSTG vector engine uses NVMe SSDs to increase data density so it outperforms specialized vector databases in both performance and cost.
By combining the functionality of an SQL database, vector database and full text search engine into one system MyScale reduces infrastructure and maintenance costs. This unification allows for joint data queries and analytics and a single data foundation for AI applications. MyScale also has MyScale Telemetry for full observability of LLM systems so you can monitor and debug efficiently. As data gets more complex MyScale is a future proof solution that can handle newer data modalities and database sizes while keeping computing performance and integration between different data types.
Key Differences
Vector Search Implementation
TiDB integrates vector search through external libraries, while MyScale has native vector search capabilities with multiple index types (MSTG, ScaNN, IVFFLAT, IVFPQ, IVFSQ, HNSW). MyScale's proprietary MSTG vector engine uses NVMe SSDs for improved data density and performance.
Architecture & Data Handling
TiDB uses a hybrid transactional/analytical processing (HTAP) architecture, handling both OLTP and OLAP workloads. It's MySQL-compatible and includes auto-sharding for data distribution.
MyScale builds on ClickHouse's OLAP architecture, focusing on time series, vector search, and full-text search. It processes structured and vector data in a unified system.
Performance & Scalability
TiDB scales horizontally while maintaining strong consistency, distributing data across nodes through auto-sharding.
MyScale leverages ClickHouse's high-performance architecture and claims superior performance and cost-effectiveness compared to specialized vector databases through its MSTG vector engine.
Integration
TiDB offers MySQL compatibility, making it suitable for existing MySQL environments with minimal code changes.
MyScale combines SQL database, vector database, and full-text search capabilities in one platform. It includes MyScale Telemetry for LLM system monitoring.
Cost and Maintenance
TiDB may require additional configuration and resources for vector search implementation.
MyScale's unified platform potentially reduces infrastructure and maintenance costs by eliminating the need for separate systems.
Choose TiDB
When you need MySQL compatibility and can handle both transactional and analytical workloads. Especially when you need strong consistency in distributed setups, auto-sharding and plan to integrate vector search into an existing MySQL ecosystem without major code changes.
Choose MyScale
When you have AI driven applications that need tight integration between vector search, time series analysis and full text search. When you need native vector search with multiple indexing options, MyScale Telemetry to monitor your LLM system and a unified platform to reduce infrastructure complexity.
Conclusion
TiDB is great in providing MySQL-compatible distributed SQL with HTAP capabilities, making it ideal for organizations prioritizing data consistency and familiar MySQL workflows. MyScale stands out with its native vector search implementation and unified approach to handling structured and vector data. Your choice should align with your specific requirements: choose TiDB for enterprise-grade distributed SQL with vector search capabilities, or MyScale for dedicated vector search and analytics with comprehensive LLM system monitoring.
Read this to get an overview of TiDB and MyScale but to evaluate these you need to evaluate based on your use case. One tool that can help with that is VectorDBBench, an open-source benchmarking tool for vector database comparison. In the end, thorough benchmarking with your own datasets and query patterns will be key to making a decision between these two powerful but different approaches to vector search in distributed database systems.
Using Open-source VectorDBBench to Evaluate and Compare Vector Databases on Your Own
VectorDBBench is an open-source benchmarking tool for users who need high-performance data storage and retrieval systems, especially vector databases. This tool allows users to test and compare different vector database systems like Milvus and Zilliz Cloud (the managed Milvus) using their own datasets and find the one that fits their use cases. With VectorDBBench, users can make decisions based on actual vector database performance rather than marketing claims or hearsay.
VectorDBBench is written in Python and licensed under the MIT open-source license, meaning anyone can freely use, modify, and distribute it. The tool is actively maintained by a community of developers committed to improving its features and performance.
Download VectorDBBench from its GitHub repository to reproduce our benchmark results or obtain performance results on your own datasets.
Take a quick look at the performance of mainstream vector databases on the VectorDBBench Leaderboard.
Read the following blogs to learn more about vector database evaluation.
Further Resources about VectorDB, GenAI, and ML
- What is a Vector Database?
- TiDB: Overview and Core Technology
- What is MyScale? Overview and Core Technology
- Key Differences
- Choose TiDB
- Choose MyScale
- **Conclusion** 
- Using Open-source VectorDBBench to Evaluate and Compare Vector Databases on Your Own
- Further Resources about VectorDB, GenAI, and ML
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading
- Read Now
Introducing Comprehensive Monitoring & Observability in Zilliz Cloud
This powerful addition to Zilliz Cloud enables users to monitor their clusters' performance, set up custom alerts, and quickly respond to potential issues.
- Read Now
Metadata Filtering, Hybrid Search or Agent When Building Your RAG Application
Using Metadata Filtering, Hybrid Search, and Agents, all integrated in Milvus, can enhance your RAG application.
- Read Now
Stop Waiting, Start Building: Voice Assistant With Milvus and Llama 3.2
We'll learn to build a Voice Assistant, a specialized Agentic RAG system designed for voice interactions, with Milvus, Llama 3.2, and other GenAI tools.
The Definitive Guide to Choosing a Vector Database
Overwhelmed by all the options? Learn key features to look for & how to evaluate with your own data. Choose with confidence.