What’s New In Milvus 2.3 Beta - 10X faster with GPUs
We are proud to announce the Beta release of Milvus 2.3 on behalf of the Milvus community. This Beta release contains new features and improvements that we are sure will boost the performance of your AI-powered applications. We appreciate your help testing some of these capabilities to quickly get us to the general release! This blog post will highlight some of the more prominent features. For a complete list of changes, check the release notes.
- 📦 PyPI: https://pypi.org/project/milvus/
- 📚 Docs: https://milvus.io/docs
- 🛠️ Release Notes: https://github.com/milvus-io/milvus/releases
- 🐳 Docker Image: docker pull milvusdb/milvus
- 🚀 Release: https://github.com/milvus-io/milvus/releases/tag/v2.3.0-beta
One of the features of Milvus 2.3 Beta is its support for GPU acceleration and RAFT-based integration, which allows Milvus to take full advantage of the power of modern graphics processing units. The GPU-accelerated Milvus delivers 10X faster performance than the CPU-only version. This can significantly enhance the speed and responsiveness of your AI and machine learning-powered applications, enabling faster and more accurate data processing.
Another critical feature of Milvus 2.3 Beta is its range search support, allowing users to search for data within a specified range. This can be particularly useful for applications that require complex data queries, as it allows for more precise and accurate searching. In addition, Milvus 2.3 Beta also supports mmap, and incremental backups, all of which can help further to boost the performance and efficiency of your AI applications. By allowing for more efficient management and storage of data, these features can ensure that your AI systems are continuously operating at peak levels.
Overall, the improvements in this release are essential for any developer building applications with similarity search capabilities.
Nvidia GPU support This new feature brings the ability to support heterogeneous computing, which can significantly accelerate specialized workloads. This new addition allows users to expect faster and more efficient vector data searches, ultimately improving productivity and performance. We compared RAFT-IVF-Flat (GPU) with IVF-Flat (CPU) and HNSW (CPU) on four datasets at a 95% recall. The GPU index achieved an average of 32x and 8x higher throughput than IVF-Flat and HNSW. Evaluation results are shown in Table 1. (These benchmarks ran against Knowhere on a host with an 8-core CPU, 32 GB of RAM, and an Nvidia A100 GPU)
Table 1. The QPS of IVF-Flat, HNSW, RAFT-IVF-Flat on four datasets at 95% recall
SIFT | GIST | GLOVE | DEEP | |
---|---|---|---|---|
IVF-Flat (CPU) | 3097 | 142 | 791 | 723 |
HNSW (CPU) | 14,537 | 791 | 1,516 | 5,761 |
RAFT-IVF-Flat (GPU) | 121,568 | 5,737 | 20,163 | 16,557 |
Special thanks go to @wphicks and @cjnolet from Nvidia for their valuable contributions to the RAFT code.
Range Search Range search is a different search method from k-NN query. k-NN queries return a fixed number of the nearest neighbors. For range search, given a query q and distance threshold R, it returns all entities within distance R of q. Range search is commonly used to find all relevant results within a specified range. For instance, it can help with (but is not limited to) data deduplication or detecting copyright infringement without missing similar candidates.
Upsert Upsert is an operation that will update an entity's value if it already exists in a collection or insert a new one if it does not exist. Milvus offers high flexibility to add data to your collections. For now, there are three options in total:
- Bulk insert for high throughput in the offline cases.
- Insert for low latency in the online streaming cases.
- Upsert for the cases if you are unsure about whether to update or insert new entities.
Change Data Capture (CDC) Change Data Capture (CDC) is the process of identifying and capturing changes to data in a vector database in real-time and delivering those changes to downstream systems. Milvus now offers zero-downtime backup and synchronization based on this mechanism. Developers can also use CDC to capture and provide a continuous stream of changes to their downstream workloads, such as data analytics or customized auditing.
Memory-mapped (mmap) file I/O In scenarios of insufficient memory in large data sets and query performance is not critical, Milvus uses mmap to allow the system to treat parts of a file as if they were in memory, reducing memory usage and improving performance if all data is in the system page cache.
Summary
In addition to all of the features listed above, Milvus 2.3 Beta includes several bug fixes and improvements. To learn more:
- See the release notes for version 2.3 Beta for the complete list of changes
- Download Milvus and get started
- Check out the Milvus benchmarks in this paper
- Summary
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading
- Read Now
Evaluating Retrieval-Augmented Generation (RAG): Everything You Should Know
An overview of various RAG pipeline architectures, retrieval and evaluation frameworks, and examples of biases and failures in LLMs.
- Read Now
Learn Llama 3.2 and How to Build a RAG Pipeline with Llama and Milvus
introduce Llama 3.1 and 3.2 and explore how to build a RAG app with Llama 3.2 and Milvus.
- Read Now
Matryoshka Representation Learning Explained: The Method Behind OpenAI’s Efficient Text Embeddings
Matryoshka Representation Learning (MRL) is a method for generating hierarchical, nested embeddings that capture information at multiple levels of abstraction.