Blog
Getting Started with GPU-Powered Milvus: Unlocking 10x Higher Performance

Getting Started with GPU-Powered Milvus: Unlocking 10x Higher Performance

Sep 29, 20234 min read

Are you ready to enhance your vector searching with GPU acceleration? Milvus 2.3, the latest release of Milvus, officially supports NVIDIA A100 GPUs, providing a 10x increase in throughput and significant reductions in latency. This blog post will delve into the motivations behind this strategic innovation and show you how to start with the Milvus GPU version.

Why does Milvus introduce GPU support?

Vector databases play a crucial role in large-scale data retrieval and similarity searching. However, traditional CPU-based indexing strategies need help to keep up with the increasing demand for high performance and low latency, particularly with the rise of Large Language Models (LLMs) like GPT-3. Recognizing the potential synergy between Milvus and NVIDIA GPUs, the Milvus team decided to introduce GPU support in Milvus 2.3.

Thanks to the support from the NVIDIA team (Special thanks go to @wphicks and @cjnolet from NVIDIA for their valuable contributions to the RAFT code), GPU support in Milvus has become a reality, making it possible to quickly and efficiently search through massive datasets and expand the AI landscape.

Getting started with Milvus GPU version

Let's dive into the steps to kickstart your journey with the Milvus GPU version.

Installing CUDA driver

First and foremost, ensure that your host machine recognizes your NVIDIA GPU. You can verify this by running the following command:

lspci

You're good to go if you see the "NVIDIA" field in the device output. Below is my device's result, which recognizes an NVIDIA T4 graphics card.

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)00:03.0 VGA compatible controller: Amazon.com, Inc. Device 111100:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 806100:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller

Next, install the necessary CUDA drivers. You can find the appropriate driver for your system on the NVIDIA website.

For example, if you use the Ubuntu Linux 20.04 operating system (OS), you can download and install the driver by executing the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.debsudo dpkg -i cuda-keyring_1.1-1_all.debsudo apt-get update

Note:

You can skip the CUDA installation step if your host machine does not require CUDA drivers.
The minimum driver version required depends on your GPU type:

NVIDIA Tesla series professional GPUs: >=450.80.02
Gaming GPUs: >=520.61.05

After installing the driver, you must restart the system for it to take effect. Once the restart is complete, you can proceed by entering the following command:

nvidia-smi

Note: The Milvus GPU image supports NVIDIA graphics cards with Compute Capability 6.1, 7.0, 7.5, and 8.0.

To learn your GPU Compute Capability, visit NVIDIA GPU Compute Capability.
For instructions on installing NVIDIA Container Toolkit, refer to NVIDIA documentation.

Milvus GPU configuration

The Milvus GPU version only supports a single Milvus process and a single GPU by default. You can run multiple Milvus processes or containers and set the CUDA_VISIBLE_DEVICES environment variable to utilize multiple GPUs.

In a Container, you can set this environment variable using -e:

sudo docker run --rm -e NVIDIA_VISIBLE_DEVICES=3 milvusdb/milvus:v2.3.0-gpu-beta

You can set this environment variable in a Docker Compose using the device_ids field. Refer to GPU access with Docker Compose Documentation for more information.

Note:

Even if you configure multiple graphics cards for a single Milvus process or container, Milvus can only utilize one.
You can fine-tune performance by adjusting environment variables like KNOWHERE_STREAMS_PER_GPU (for CUDA stream concurrency) and KNOWHERE_GPU_MEM_POOL_SIZE (for GPU memory pool size).
We strongly recommend adjusting your environment variable if you deploy two Milvus processes on a single graphics card. Otherwise, Milvus may experience crashes due to memory competition.

Building Milvus GPU version locally

Before you build Milvus locally, ensure you have installed the necessary dependencies.

CUDA Toolkit

sudo apt install --no-install-recommends cuda-toolkit

Python 3, pip, libopenblas-dev, libtbb-dev, and pkg-config:

sudo apt install python3-pip libopenblas-dev libtbb-dev pkg-config

Conan, a C/C++ package manager:

pip3 install conan==1.59.0 --userexpoprt PATH=$PATH:~/.local/bin

CMake (>=3.23): refer to Kitware APT Repository for more details.
Golang: refer to Go documentation for more details.

After you have installed all the necessary tools, build the Milvus GPU version using the following command:

make milvus-gpu

Running Milvus

Start Milvus in standalone mode by running the following command:

cd binsudo ./milvus run standalone

If you prefer containerization, you can use the provided docker-compose.yml file for deployment.

docker-compose up -d

Conclusion

The introduction of GPU support in Milvus 2.3 opens up exciting possibilities for accelerating vector database performance. With NVIDIA A100 GPUs at your disposal, you can achieve remarkable gains in both throughput and latency, making it a compelling choice for data-intensive applications and AI workloads.

Updated on Mar 07, 2025

Jaken Ma
Staff Engineer at Zilliz

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

Multimodal Pipelines for AI Applications

Learn how to build scalable multimodal AI pipelines using Datavolo and Milvus. Discover best practices for handling unstructured data and implementing RAG systems.

The AI Revolution in Marketing: How Vector Databases Are Unlocking True Personalization

Explore how vector databases and AI are transforming marketing platforms, enabling real-time personalization and predictive analytics while balancing automation with creativity.

10 Open-Source LLM Frameworks Developers Can’t Ignore in 2025

LLM frameworks simplify workflows, enhance performance, and integrate seamlessly with existing systems, helping developers unlock the full potential of LLMs with less effort.