Recently, there has been a trend towards managing high-dimensional vector data in data science and AI applications. This is fueled by the proliferation of unstructured data and machine learning (ML), where ML models usually transform unstructured data into feature vectors for data analytics. Existing systems and algorithms for managing vector data have limited functions and usually incur serious performance issue when handling large-scale and dynamic vector data.
This paper presents Milvus, a purpose-built data management system for managing large-scale vector data, describing the following:
- The design and implementation of Milvus.
- 10 real-world use cases supported by Milvus, including image/video search, chemical structure analysis, COVID-19 dataset search, personalized recommendation, chatbots.
- A comparison of Milvus with data management systems including two open source (Vearch and Microsoft SPTAG) and three commercial systems. Experiments show that Milvus is up to two orders of magnitude faster than its competitors while providing more functionalities.