Zilliz logo

Milvus performance on AVX-512 vs. AVX2

This article tests and analyzes the performance of Milvus on both the AVX-512 and AVX2 instruction sets using different indexes

November 10, 2020 by Zilliz

Link copied


An instruction is an order given to a computer processor by a computer program. At the lowest level, each instruction is a sequence of 0s and 1s describing a physical operation the computer is to perform. Generally speaking, in the computer assembler language, each language statement corresponds to a processor instruction. CPU relies on instructions for calculation and to control the computer system, and instruction execution capability is an important indicator of CPU’s performance. The instruction set is also closely related to CPU efficiency.

Proposed by Intel in March 2008 and first supported by Intel’s Sandy Bridge processors in 2011, Advanced Vector Extensions (AVX) are an instruction set for microprocessors of the x86 architecture. AVX introduces new features, new instructions and a new coding scheme. AVX2 expands most integer operations to 256 bits and introduces fused multiply-accumulate (FMA) operations. AVX-512 expands AVX to 512-bit operations using a new EVEX prefix encoding.

Milvus is an open-source vector similarity search engine that supports embedding of unstructured data using various AI models and provides vector search features for a wide range of applications including image processing, machine vision, natural language processing, speech recognition, and recommendation engines. As of v0.7.0, Milvus has added support for the AVX-512 instruction set. This means, theoretically, it can work on all CPUs containing the AVX-512 instructions.

Performance Comparison


  • CPU: Intel(R) Platinum 8163 CPU @ 2.50GHz24 cores 48 threads
  • Number of CPU: 2
  • Graphics card, GeForce RTX 2080Ti 11GB 4 cards
  • Mem: 768GB
  • Disk: 2TB SSD

Milvus parameters:

  • cahce.cahe_size: 25, The size of CPU memory used for caching data for faster query.
  • nlist: 4096
  • nprobe: 128

Note: nlist is the indexing parameter to create from the client; nprobe the searching parameter. Both IVF_FLAT and IVF_SQ8 use a clustering algorithm to partition a large number of vectors into buckets, nlist being the total number of buckets to partition during clustering. The first step in a query is to find the number of buckets that are closest to the target vector, and the second step is to find the top-k vectors in these buckets by comparing the distance of the vectors. nprobe refers to the number of buckets in the first step.


We use the SIFT10M dataset, which contains one million 128-dimensional vectors and is often used for analyzing the performance of the corresponding nearest-neighbor search methods. We will compare the top-1 search time for nq = [1, 10, 100, 500, 1000] between the two instruction sets.

Vector index

Vector index is a time-efficient and space-efficient data structure built on the vector field of a collection using a certain mathematical model. Through vector index, we can efficiently query vectors that are most similar to the target vector.

Since accurate retrieval is usually very time-consuming, most of the vector index types of Milvus use ANNS (Approximate Nearest Neighbors Search). We will test three indexes with AVX512 and AVX2 this time: IVF_FLAT, IVF_SQ8 and HNSW.


IVF (Inverted File) is an index type based on quantization. IVF_FLAT is the most basic IVF index, and the encoded data stored in each unit is consistent with the original data.



IVF_SQ8 does scalar quantization for each vector placed in the unit based on IVF. Scalar quantization converts each dimension of the original vector from a 4-byte floating-point number to a 1-byte unsigned integer, so the IVF_SQ8 index file occupies much less space than the IVF_FLAT index file.



HNSW (Hierarchical Small World Graph) is a graph-based indexing algorithm. The search starts from the uppermost layer, finds the node closest to the target in this layer, and then goes down to the next layer for another round of search. After multiple iterations, it can quickly approach the target position.



By analyzing the performance of the above three indexes on two instruction sets, we can see that vector retrieval is slightly faster on the AVX-512 instruction set than on AVX2. This is because the AVX-512 supports 512-bit computation compared to 256-bit computation on the AVX2, and at this level the AVX-512 should be twice as fast as the AVX2. However, Milvus has other time-consuming tasks in addition to its calculations, so the overall retrieval time of the AVX-512 is not twice as long as that of the AVX2.

Index comparison



It can be observed that HNSW retrieval is significantly faster than the other two indexes, while IVF_SQ8 retrieval is slightly faster than IVF_FLAT on both instruction sets. This is probably because IVF_SQ8 requires only 1/4 of the memory than IVF_FLAT: IVF_SQ8 loads 1 byte for each vector dimension, while IVF_FLAT loads 4 bytes per vector dimension. The time required for the calculation is most likely constrained by the memory bandwidth. As a result, IVF_SQ8 not only takes less space, but also requires less time to retrieve vectors.


This article tests and analyzes the performance of Milvus on both the AVX-512 and AVX2 instruction sets using different indexes, and Milvus shows excellent performance with all types of indexes and performs better on the AVX-512 instruction set.

Milvus is compatible with a variety of deep learning platforms and is used in miscellaneous AI domains. In the latest v0.11.0, Milvus adds scalar field filtering and the ability to specify how distances are calculated during queries. Milvus will continue to add more features to facilitate unstructured data retrieval. For more information about Milvus, see our official website. You can also follow us on Twitter or join us on Slack.