Introducing Milvus 2.6: Scalable AI at Lower Costs with James Luan, VP of Engineering at Zilliz

Zilliz Webinar | Zoom

Join the Webinar

About this webinar

Join James Luan, VP of Engineering at Zilliz, for an exclusive deep dive into the latest release of the open-source vector database Milvus 2.6. Built for AI workloads at scale, Milvus 2.6 introduces major architectural improvements that dramatically reduce storage, compute, and operational costs—without compromising on performance.

You'll learn

New Features for All Users: How tiered storage, vector quantization, and a diskless WAL (with Woodpecker) can cut infrastructure costs by up to 10X—ideal for scaling AI workloads affordably.
For Existing Users: How Milvus 2.6 simplifies your operations with new features like CDC + BulkInsert for easier data replication and native package support that streamlines installation and upgrades (no more manual dependencies or setup headaches!).
How Milvus 2.6 boosts developer productivity with built-in tools for ingestion, advanced search, analytics, and reranking—making it easier than ever to iterate and deploy your AI applications.
What's next on the Milvus roadmap: Insights into future enhancements and how they’ll further streamline your workflows.

Whether you're just starting with vector search, building GenAI applications, or managing multimodal data, Milvus 2.6 offers significant advantages that can scale with your needs.

View presentation slides

Transcript

my name is Chris Churilo and um joining me today is the world famous 0:09 James if you've been playing around with the Milvis project then you know uh James is 0:15 the VP of engineering at uh Zillis and uh the key maintainer of Milvis and um 0:23 James is James is like incredibly passionate about Milvis 26 so if this is the first time that you've heard him 0:29 speak you are in for a treat um and so let's just go ahead and get started so 0:35 let me turn on slideshow and just give everybody an introduction to uh MilVIS 2.6 so um of course you 0:45 know we are a open-source vector database some of you might have now by now everyone's heard of us that's on 0:50 this uh webinar and we're really excited about the number of contributors and also the uh community engagement so once 0:58 again I can't thank you all enough um it's really been a pleasure working with everybody so thank you again 1:05 um you know one of the things that we like to stress to our community is that 1:11 we are not trying to be everything to everybody uh we are not trying you know we know 1:17 better that we should stay in our lane and we really feel very strongly that where we really um shine is where we 1:24 focus on which is making sure that we've built a vector database from the ground up it's not pegged onto some other tool 1:32 it's fully open source Apache 2 license we know it's important because it's important to us when we look at um 1:37 different projects you guys also find it very important so you can actually look under the hood and also we get really 1:44 incredible contributions that is visible to everybody uh and then thirdly as I 1:50 mentioned we want to stay in our lane so what we like to focus on is a fully distributed system that can really 1:55 perform at scale and we're not going to be you know this tool that you can just 2:00 like very quickly get up and running to you know prototype on although you can with you know MILV if you have a little 2:07 more patience but really we are trying to make sure that we can help your multi-tenented 2:13 um you know very performant very uh sophisticated application can really work well with vector search that's what 2:20 really focused on so when you look at the feature set that uh James is going to describe for 26 you'll see we are 2:25 continuing to stay in our lane but there's a couple other themes that I want to make sure that um we uh share 2:33 with y'all because I think you will appreciate that um you know as engineers we don't get a blank check from finance 2:40 they don't just say go ahead Chris spend as much money as you want on any infrastructure we get it we know that uh 2:47 we always have to make sure that we can provide a solution that's going to be as cost-effective as possible and the way 2:53 that we look at it is that um one way we can do that is by lowering infrastructure costs another is is that 3:00 we can build tools so that you don't have to so we can help boost your uh productivity the other thing is um where 3:07 we have dependencies that are kind of difficult or kind of a hassle we want to remove them as much as possible once 3:13 again boosting your productivity but it also lowers infrastructure costs and just lowers complexity overall 3:20 uh because we know that's really important and then also you know you've come to use things like object store you 3:27 already understand tiered storage we want to make sure that whatever you have in place already that we want to fully 3:33 take advantage of so these are kind of the themes that we have in 26 and um and 3:40 it's it's only going to get better from you know after 26 because our goal is to 3:46 make sure that we can help you make sense of all of your unstructured data but in order to do that we have a really 3:54 big internal goal that we need to help you to reduce your costs even more we need to make MILV a lot cheaper to be 4:00 able to uh use in your uh in your infrastructure and we know that we still 4:06 have a long ways to go so we appreciate your patience as we you know build out these capabilities to support you uh and 4:13 we hope that you can appreciate the the this goal that we have it might seem counterintuitive you know coming from a 4:19 product vendor but uh like I said we we know we don't get a blank check u we 4:24 know you don't either so uh we feel that we're kind of all in this together all right so I'm going to hand the mic over 4:31 to James and we'll start going really deep into what these capabilities are 4:37 all right nice yeah uh thanks Chris i I I think the uh introduction uh is 4:43 actually pretty good so the goal for the milics is just try to uh reduce the cost 4:48 for everyone so uh we can see more use cases yeah so uh that uh we I'm actually 4:56 pretty excited about the uh all the new features for uh 2.6 we spend around like half a year uh you this is the first 5:03 time I talk about 26 because I I was like pretty busy about make making all those great features happen but uh 5:10 finally it's there yeah so uh the first part I want to talk about is uh uh how 5:17 we can leverage all the uh cloud for us and make make vector search even cheaper 5:24 yeah so uh first feature uh our our uh signature feature is actually tire 5:31 storage so you can separate all your hot and code like data uh if if people like 5:37 when people making all those AI applications one of the challenges here is like multiency uh you actually build 5:45 apps for million or like tens of million different users and only small part of 5:50 them is actually uh active users uh similar thing happens even if you build 5:55 enterprise applications you may have datas like for the recent three months 6:00 it's going to hot and for most the other want to search on it but it's like super 6:07 cold and don't you definitely don't want to like load all the data into into uh memory or disk u yeah so uh that that's 6:15 why we build this smart cache uh the architecture the the architect seems to be a little bit small but uh uh just 6:22 take it as another layer of caching on top of object storage yeah so uh as as 6:29 we may all know that uh like uh like start from 2.0 zero like M was actually 6:34 using all the object storage as a persistent layer but the original way do that is actually loading all the data 6:41 into uh main memory or disk and try to serving it uh at a faster speed right so 6:47 we uh for the for the last couple years we observed a lot of people they try to loading releasing all the collections 6:53 managed by themsel uh when people uh logging they they loading all their datas and when people like uh leave 7:00 their applications they release it it's actually a lot of work to do and the performance in is not ideal so we 7:05 thought why not we do it by ourselves to build another like a caching layer on 7:12 top of all object storage so people don't need to worry about it so uh with 7:17 with the ter storage uh you could say that uh the hot data performance is 7:23 still similar to H&W or DSN uh you actually using right now based on either 7:29 the caching is on disk or it's on it's on it's in memory uh but if you if once 7:35 you eviction all the data uh onto the object storage you should see the latency to be uh higher maybe one to two 7:42 seconds but on the other side like for many of the other codeas like latency is not very important but with the new ter 7:49 storage implementation you you will see that your uh cost of storage actually drop to like another five to 10 times 7:56 yeah so uh second second feature I I 8:01 want to mention is actually uh rapid Q so it's actually a quantition uh 8:07 algorithm uh which which after a lot of evaluation we thought it's actually the sa all all 8:14 the like contition algorithm we see so uh I know some of you guys maybe heard 8:19 about binary contradition so they just try to convert a 32-bit float into a one bit so they can compress all the vectors 8:26 32 times right but but the cost for binary contition is that you lose a lot of recall and uh for people who build 8:34 rag applications you know that uh accuracy is actually very very important especially for top three top five 8:39 accuracies right so uh that's uh that that's why we introduce the uh the new 8:46 contition uh we'll we'll talk about this in the in the in the next slides uh for a little bit details but just remember 8:53 uh for any of the open source users you got a a new option and it can be worked together with both IVF index and W index 9:01 yeah so uh third thing we introduced is uh is intate vector support as well as 9:07 its hw index uh we saw some model vendors they try to do quantation by 9:13 themselves for example uh cohhere their model can generate both uh float 32 9:19 vectors in8 and even binary vectors uh m already support binary and float 32 and 9:25 now we finally support in8 so uh with with the new uh intended implementation 9:31 the recall is like one or two% lower than full 32 but you get the chance to 9:37 like optimize four times your memory yeah so last thing is uh M storage 2 uh 9:45 we do we do fully leverage uh parkrete and arrow uh at the very beginning of 9:52 the design of M uh uh 2.0 uh after a 9:57 period we find that the the the challenge is that it's very hard to store all the object storage data and 10:03 all the vector datas into into a park format and also park format has a lot of limitations on how you can do uh very 10:11 fast point queries on object storage that's why we that's why we redesigned our uh storage uh layer and we'll also 10:21 cover this in the next size Yeah 10:26 okay so uh the first thing I want to mention is the rabbit Q uh first of all 10:32 it's a banner consition method uh the the key idea for rabbit q is that think 10:39 about this for a three-dimensional uh like datas uh range between uh minus1 to 10:46 to one it can be chosen if if it's a random vector then it can be chosen any 10:51 point uh between minus one and one right uh but if you normalization your data 10:58 with a very high dimensional uh space you see that uh all the data will be 11:04 located in a very smaller range uh very close to zero because when when 11:10 dimension gets high then for each of the dimension uh it it gets some kind of certainty 11:16 So so so uh it it it it seems to be a little bit uh hard to understand but if 11:22 you guys are interested you can you can just uh go to read the paper is actually one of the best paper on sigma 2024 11:29 uh and uh you by by leverage these features uh we we it's actually some 11:37 kind of stats so uh we can actually uh reduce the uh 11:44 link to the paper yeah we we'll show that later yeah you can you you can actually uh reduce the uh average 11:50 average arrow between your quantized vector and your original vectors yeah so 11:56 uh using rabbitq you see high search accuracy and and the other best part for 12:02 this is it's fully hardware friendly so you can optimize it with uh sim instructions and it can it can be 12:09 combined with any kind of index especially for fresh scan uh they have a the other paper talks about how raviq 12:15 can be worked together with the fast scan for better performance so we'll we'll definitely share all those two 12:21 papers yeah but uh from our uh like result uh if just to think about one bit 12:27 without without any refine or without any uh reranking it seems to be uh three 12:34 or four percentage higher on the accuracy compared to the traditional product content 12:40 is 10 times yeah if you compare with S scatter quantation could be even like 10% um like higher u uh like uh recall 12:49 higher and also the uh QPS is kind of like similar with the scatter conditions 12:55 so uh leverage the new composition algorithm uh we actually implement uh 13:00 rebq and also the uh uh and and also the 13:06 hnw rabbitq features so uh using this it's actually uh under the sim recall is 13:12 actually uh double the QPS and it also can uh save you some memory compared to 13:17 the like uh original IVF index yeah next so you just have a little question about 13:23 high accuracy okay so how do we define hierarchy uh 13:30 that that's a good question so usually we use two things to evaluate accuracy uh one is recall uh which is saying if 13:38 people are asking for top 100 result how many results is actually get uh uh 13:43 retrieved uh after doing a nearest neighbor search so if you're using brute force search then you recall it's 13:50 definitely 100% uh accurate you get all the 100 result but using any kind of nearest neighbor uh algorithm you you 13:57 you should see you you might missing some of the result so ideally we want to see like 98 99% of your recall but uh 14:04 once you do heavy uh like if you're using H&W index or discount index you can definitely get 90 98 99 but if 14:11 you're using some index like IVF index you might get 95 and if you do 14:17 quantization you should see your recall keep getting lower and for for example the naive battery quantition you should 14:24 see recall under 75 or 80% so which means you if you search for top key 100 14:31 you get like 75 80 of them yeah so the other one is using NDCG so the major 14:38 difference between NDCG and the and the recall is that it's it's not only evaluate how many result I can get it 14:44 also uh evaluates it ranking so the higher the ranking for example uh the 14:49 the first result has the most weight uh like is most weighted so uh it if if you 14:56 get the first first uh uh if it should be the closest one but it actually shows 15:01 in your search result to be the like 20th it's still good because you get the result but it's not ideal right you want 15:07 the the search result to be show on the on the first position as well so that isn't easy it it considers all the uh 15:14 recall but also the uh ranking yeah okay 15:19 yeah so so the second exciting feature is actually storage v2 so it's it's 15:25 actually working together with the ter storage because uh originally what we do is just loading all the data from uh 15:31 object storage and caching it so we don't really care about the performance of our storage format yeah but uh uh in 15:38 the in in the new release with the uh ter storage and with the next release we when we have the vector lake solution so 15:46 the storage uh the storage on the object storage becomes very very important uh many of you might heard about uh the 15:53 lens format which is actually designed for vector lake uh we kind of like uh 15:59 implement the same thing but using fully partrrete files so it actually compatible with all your stacks so you 16:06 don't really need to introduce a new format into into your big data or into your like uh AI inference yeah so uh the 16:15 uh the goal for start B2 is actually uh two thing one is that we want to introduce more data types especially on 16:22 datas for example long long tags for example the blob storage which can help you to store all the images and v uh 16:29 audios into a vector database so uh so those those datas is actually going to 16:35 be huge so you cannot directly store them uh those datas into park because it 16:40 actually breaken your row group uh think about this you have one field with four 16:47 bite uh in each row and the other like uh field with four kilobytes then what 16:54 could happens is because park is trying to store all the data into same row group right so then what happens is uh 17:00 if you tune the ro to be really really large then uh you have huge amount of 17:06 datas on on the on the smaller fuse yeah and uh if you want to retrieve one of the fields is it's is it's going to be 17:11 very expensive because I have to retrieve the whole row group but on the other side if I tune the row group to be smaller then uh for the for the for the 17:20 large field they may only have like 10 10 or like 20 rows in one row group and 17:27 uh then the the the filtering speed or the sequential rate speed could be low so it's kind of like a dilemma yeah so 17:33 the way we handle that is actually we split large and smaller field into different row groups or different files 17:38 yeah and we put all the vectors stored outside of parkit because uh the the 17:44 initial version the start one version we actually store uh all the vectors in a park array and we found that the 17:51 serialization serialization is actually very costly so that's why we move also 17:56 move the vector out of the percrete it's it's not designed for vector at all yeah we also fully leverage the percrete page 18:02 sets to accelerate plant queries so it seem to be pretty stable uh battle 18:08 tested and uh from our test we see that um uh compared to start V2 and V1 uh we 18:14 actually got five uh 50 times acceleration on pawn queries so uh that is very impressive on the uh uh tire 18:23 storage use case yeah next slides 18:28 okay so uh we we care we care not only about cost for sure yeah during the last 18:35 couple years we we've been working with AI developers uh try to build something they can like all the tools so uh they 18:42 can easily depends on uh for example the hybrid search features uh at the very beginning we thought you should just use 18:49 two system like using elastic and muis uh so uh you can you can manage both and 18:55 do a hybrid search uh using your own logic and people complains a lot about 19:01 it should be simplified there's no need to maintain two two different clusters uh that's why in the last mill release 19:08 we actually support hybrid search and also fut search right so similar things happens here as well so the one of the 19:15 most exciting feature we want to uh talk about is that out yeah so a lot of 19:20 people uh is keeping keeping asking us why cannot just uh vector is working 19:26 together with my embedding models so I don't really need to call the API twice so usually what what's going on here is 19:31 people call open and probably call Ginai embeddings uh get the embedding and try 19:37 to concate their scalar fields with their embeddings and then ingest into 19:43 muis yeah so um but it's not only about embeddings there are a lot of 19:48 pre-processing sometimes you need also need post-processing for example you need re-ranking you need highlighting or 19:54 your datas uh and we thought it's a it's a we we we need to integrate this but we 19:59 we also need to offer some kind of flexibility to uh users so they can design their own uh pipeline to 20:06 processing their datas so in the in the 26 uh we actually uh introduced an uh 20:13 another uh functionality which we call it a pre prep-processing pipeline post-processing pipeline so using 20:18 pre-processing pipeline the the the easiest usage is you can directly call all the embedding vendors uh using m or 20:27 you can we can we can even work together with uh inference engines such as wheel 20:32 or uh hugging face tax inference engines so you don't really need to host on that 20:37 and on also on the open source uh mu uh 26 our kubernetes operator already 20:43 support to bring up a whm uh inference engine so we we we actually not helping 20:49 to call those uh embedding model in inference but we also help you to deploy 20:54 uh those inference engines but you can also like use uh for example open or cohhere so you don't really need to like 21:01 manage all those stuffs yeah uh post-processing is actually doing the same thing uh we actually the initial 21:07 step is we implemented re-ranking um model so you can uh calling uh cohhere 21:14 or calling Google APIs to just get a reranking uh but in the next couple of 21:20 release we also support more complicated type for example highlighting I I know a 21:26 lot of people uh when when they do search they want to show their customers and explain why uh certain results is 21:33 actually picked from the search right So that that's why we're working on highlighting the the one of the 21:38 differences between our highlighting and the uh like traditional search engine is we do semantic highlighting so it not 21:46 only shows uh the keywords but it also shows the semantic relations some 21:52 sometimes uh you you you uh like you you can you can if you're using density many 21:58 search you can actually get a result without even any of the keywords but it's still semantically related so uh 22:05 using our latest highlighting this this uh this is not there yet but should be 22:10 come out in the next one or two months so using the semantic highlighting it's easier for all the rack developers to 22:17 explain why this is actually searched yeah so uh we also implement the data 22:24 model so the struct list or embedding this is um it's something really cool and I'll explain this in the in the next 22:30 size uh we do improve a lot of search functionalities freeze match helping you 22:36 to uh match a phrase so so 2.5 we already support keyword match so you can 22:42 uh match a couple of keywords but freeze match is actually helping you to to to to uh match a keyword uh a phrase for 22:51 example vector database instead of match vector and database and you can also care about their uh locations so you 22:58 just search vector and database uh but not database vector yeah uh we do have a 23:04 multil language tokenizer uh one of the uh tokeniz that we introduced is Linder 23:10 which is pretty good for uh Korean and also Japanese and the the the best thing 23:17 for that is we as we also wrap a multi- language tokenizer because as we see as 23:22 as the as the word becomes more global uh when people build applications especially AI applications uh you are 23:30 actually a day one global application you serve people from multiple part of the world not 23:36 US not only Europe yeah so using the multi uh language tokenizer we actually 23:41 be able to support more than 60 different language and u as you can specify which language uh user are using 23:50 then uh we actually split the the BM25 for different tokeniz uh like language 23:56 into different stats so you don't really need to worry about like different uh 24:01 language users that actually have some uh uh like influence on each other when 24:07 when when when when we create the stats yeah so uh we we we also create a a 24:14 bunch of like interesting uh tools for developers to make their develop uh 24:20 development moves faster uh one of the cool feature we add is add field because 24:25 right now uh the schema for mu is actually pretty stable so uh if if you 24:30 said I have three different field one is ID field the other one is a vector field maybe the other one is your text field 24:37 and someday you you said I I might need to add some text or some futures on 24:42 another field but there's no way you can you can you can back fill all the data and add another field so you have to 24:47 copy all the data doing all the ATLs which is very expensive especially you have large amount datas so right now you 24:53 have the option to add another field uh without uh break the traffic yeah we 25:00 have a sampling feature so sometimes you just need to uh have a taste of what's 25:05 actually in your uh collection or in your data but you can adjust the scaling on that right now we we actually offer a 25:12 query sampling feature so you can sampling some of the data in in your data set and using this query sampling 25:18 feature we uh you can also uh search and evaluate your recall so that is also a a 25:26 new feature offered on this card so because a lot of people they uh they don't have a ground truth they don't 25:32 have a real word data set to evaluate how well their uh their search or their 25:37 accuracy is but using the cr sampling data plus the recimation you'll be easily able to understand how well your 25:44 your your your query actually do in in production environment yeah so time 25:50 aware decay function is a reranking algorithm uh working for agent builders because we 25:57 know that agent memory sometimes uh we we want to prioritize uh fresh data 26:02 instead of like old datas uh that is similar as a human because uh we forget 26:08 a lot of uh old things right so so uh in the in the uh new uh newest 26 we 26:14 actually introduce this uh decay function so we have multiple different kind of decays but uh the general idea 26:21 is the newest data the uh higher relevancy or the higher priority it will 26:27 have and show on your search result yeah so uh last is a is a improvement of our 26:33 time to leave feature uh we actually introduced time to leave uh long time ago like two three years ago uh but 26:39 there are some major limitations we we we we keeps uh hearing from uh heard 26:46 from uh users that even if time is actually uh like uh already expired is 26:52 actually not actively uh clean and it can still show in the search result so 26:58 uh in the in the in the newest two uh 26 uh we actually improved the TTL feature 27:03 so uh for all the data is actually expired it would it it will directly uh 27:09 removed from your search result and we also have a uh limitation comp uh like a 27:15 deadline compaction strategy so for any time uh for any data which is spared for more than one days we just do compaction 27:22 and clean all the datas so you have a guarantee to clean all your datas yep 27:29 okay so uh one of the cool features for uh 26 is for rack users is the struct 27:37 list yeah because originally uh we we have this table model and each of the 27:43 primary key is unique so when when you when you build a uh rag applications especially for like ch uh talking with 27:50 your pdf apps or web page uh what you always do is uh for each of the row or 27:56 each of the intake inside mu it's actually one one chunk it's it's not one 28:03 document right so uh the problem for this is sometimes when we do search uh 28:08 we want to search the most like top 10 similar documents inside of a similar 28:13 chunk uh that's why we introduced a feature called group by in 2.4 and so so 28:19 you can you can search and group by with uh uh document ID equals to something so 28:24 it just to show you top 10 most relevant documents inside of chunk uh but but we 28:30 saw this is not really uh really good enough uh couple of reasons for that 28:35 first of all sometimes when we do search we want all uh chunks uh uh inside each 28:42 document to be colllocated together so we can do some actual uh either it's 28:48 reranking or some actual computation on top of it one of the best example is to using multivactor or cobbertt embedding 28:55 models so uh if you're using cobbertt for each of the document they are just to generate multiple embeddings and when 29:02 you do search uh you actually need to do a maximum uh distance so when you do maximum you you you want to make sure 29:08 that all the uh vectors inside one document is actually collocate together yeah so so that's why we introduce a new 29:14 data model uh you still have a primary key and the primary key is the document ID and inside each row you can have a 29:21 list of embeddings so if you do chunking let's say if you split the document into 29:27 10 chunks uh you just uh put the 10 chunks into one tensor or into one list 29:33 of embeddings instead of put into 10 rows and based on that you can do a lot of uh uh reranking implementations you 29:40 can we also see a use case where people want to do time series on their uh vector embeddings uh that is a video use 29:47 case right you you actually do cutting all the videos into frames and for each of the frames they actually have uh 29:54 context informations and they have time series relations so so we we actually introduce a special matrix so uh the 30:02 embedding distance calculation can actually take taking care of the uh temperature relations yeah and the the 30:08 best part is not only about embedding list it's also a strct list so which means uh you can you can design schema 30:16 which you have a title embeddings you have some tags in the in the titles but 30:22 for your content you have a new strct the content is actually split into like 30:28 50 different chunks and for each of the chunks you can also have some uh other meta metadata field uh and then you can 30:35 you can you can filter on the on the metadata field as well so so the the the data model itself becomes more and more 30:42 complicated uh and it's actually we we're just trying to catering all the all the need for the rack developers and 30:49 that's what we believe why this uh uh vector database need need to be there if 30:54 if you want to add the similar thing in a pg vector or elastic user is very very 31:00 hard and the way we do that is actually we start from the index itself and then try to u modify the uh mules the the the 31:10 database layer pretty much similar to what they do uh only that's taking in the last 10 years many of the 31:16 innovations happened on the lucine side you need to know a lot of details about your uh search engine then uh on the top 31:24 layer you build elastic trying to leverage all those features into a more scalable way 31:31 yeah so there's a question how does this compare to open search and p vector uh I 31:38 think I will compare it in three way first if you have one million datas and you just doing naive rex 31:43 uh please use PG vector because this is the easiest way you can do vector search 31:49 it's not good at performance not good at skill uh we are good at it but it's it's 31:54 it's kind of like based on your use case so uh second difference I I I would see is that uh uh we did a lot of uh 32:02 optimization on the uh how how we can work with AI applications for example 32:07 the h research for example the new data models so it's easier for all the developers ers to faking their uh 32:14 applications into the new data model yeah just just like MongoDB when you do 32:19 like uh um mobile applications uh at the very beginning nobody want to store their data in a relational database they 32:26 just want to store their JSON in a in a super easy uh like document stores 32:33 similar things happens here when you have vectors you might have different kinds of vectors not only like very 32:38 simple dense embeddings you also have sparse embeddings have vector less multi vectors binary vectors and you you won't 32:45 find a database that can easily store everything into it yeah next page 32:52 something hang 32:57 sorry about that 33:06 yeah so maybe a couple of um more comments on the uh open search so we 33:12 today we do release our vector DB bench 1.0 so uh yeah it's actually listed here 33:19 so it com it it compared uh MU series with um many of our uh the uh 33:26 competitors or the players in the market um it's for sure it's open source so uh 33:32 feel free to evaluate by yourself so the the the the interesting part is uh it it kind of become a standard because every 33:39 other uh vendors who want to support uh vector search actually contribute to vector DB bench and for their own 33:46 connectors so it's uh also a lot of fun for us to to providing different kind of 33:51 systems and see what they what it how it does yeah so I I think third part I I 33:57 would say is um performance yeah uh as 34:03 as you may know MUS is actually known for its uh scalability and performance 34:09 for for for quite well we do have a couple of large customers probably the largest customers uh in the in the 34:16 planet yeah um one of the 34:21 most exciting I I would say this is probably the most exciting feature uh I I see in 2.6 is uh JSON shredding and 34:30 JSON index so uh in 2.2 we actually introduce a dynamic field uh so it it it 34:37 makes your data model more flexible but on the other side it makes your 34:43 filtering super slow that's that that's how we see in the uh last one year yeah 34:48 so uh after a lot of evaluation we actually uh leverage the JSON shredding 34:55 policy so it's kind of like very similar to what click house does um we'll 35:00 explain a little bit about what is JSON shredding but the results seem to be very very impressive we see 100 times 35:07 performance improvement on uh uh like dynamic fieldings so which means uh after 2.6 loads out 35:18 you probably don't really need to think about your schema as serious as before 35:23 2.6 because even if you make mistakes don't worry about this you can still using dynamic schemas because the the 35:30 the fusion space is almost similar compared to the uh like a fixed schema 35:36 datas so that could be a big difference uh engram index is actually using for uh 35:42 regular expressions and uh uh like uh we know that mu is not built for that but 35:49 uh uh when a lot of uh regular cases we still see people trying to using text 35:54 match or uh like to to uh match some of their keywords so uh we we we we build 36:02 this in index so the the the uh uh so the design like we split all the 36:08 data into smaller uh tokens for example if you do two gram then we split all your uh data into a group of two and uh 36:17 we build index on on the uh each two characters and when search happens we 36:23 just uh for example you search for hello then we just split hello into u uh group 36:29 of two so you search for whether H is in is is is in your tax whether EL is in 36:36 your tag whether LL is in your tag and L is in your T if all those uh two G is in 36:42 your text then which means this uh uh this tax is actually a match so it's 36:48 it's you can do two G three G or even like four G so the larger the granularity the lesser resources but uh 36:56 it also have some limitations because if you do for gram then you cannot search uh for uh like token with three three 37:03 characters yeah uh so this help to accelerate all the text match and uh 37:09 regular expressions so mini hash uh is a locality sensitive hash index uh built 37:18 for huge amount of data uh duplications uh we're working with uh with one of the 37:24 world largest uh larger model uh training uh companies so they have 10 37:30 billion level datas uh called from all the all the web page and uh the first thing after they call all the data out 37:37 is they want to do a dubrication so uh we actually create this uh uh algorithm 37:43 which called mi hash and build a index on top of it so it it can helping people to do large scale duplications uh 37:51 currently the the the uh production environment we run is actually about uh 37:56 10 billion uh vectors and there's a lot of optimization to do for anyone who has 38:02 that uh large amount of data we definitely h happy to help so come to us 38:08 and uh maybe not this use case but uh we we will definitely uh glad to help on on 38:14 like optimize on that yeah so async pime's client is uh is a lot of people 38:20 asking for that so uh it has well implementated implemented and also 38:26 integrated with uh linex and uh uh launching so uh for anyone who is 38:34 interested in using async uh feel to give us a feedback yeah so last thing is a way tob bench so uh I already talked 38:41 about this so uh just go go to GitHub and search for vector database uh vector 38:47 database bench uh the first one should be uh vb bench under this tag just try it and we we get a fancy uh UI so so so 38:56 you can see the all the result and it's very easy to deploy just with one single docker 39:02 yeah yeah so uh just give a very general idea 39:07 what what is a JSON threading so on the uh left side is uh is a is is a JSON 39:13 field we have we have a couple of different paths one is A uh the other one is B so the special part for B is 39:20 that uh not all the data in B is actually under the same data type some of them are strings the other ones are 39:27 actually integers and we also have D E and uh F so uh they probably just show 39:35 in some of your um like uh entities definitely not all your or your rows 39:41 right so we actually split all the data into three different three different type of case uh if a is shows in all of 39:50 your data or most of your data then we call it as a tapped case so we directly uh directly store it in one column 39:56 storage uh for B and uh uh for example they have both strings and integers we 40:03 actually split into two different columns to store them one one of them is 40:08 a string column then other one is a int column and for the columns which don't have this data just using null yeah so 40:16 sim uh similar thing happens um on d and u you have all all the strings and for 40:21 other uh like entities which which don't have this field you can just uh fill in with null and um for the rest of your 40:30 datas um then you can you can you can store it with a bon format and we 40:35 actually create u bon stats on on top of the uh format so it's it's kind of like 40:41 a inverted index so search on it should be much faster because uh all the data only shows in part of your uh JSON or or 40:50 part of your entities so uh by creating an inver index is actually accelerate the filtering speed so with with with 40:56 all that we we actually seeing um accelerating between 10 times to 300 41:02 times and um dynamic filtering seems to be always done in less than 10 millconds 41:08 yep okay so uh last part we want to mention 41:14 is uh how to reduce the maintenance cost uh we all know that Mis is actually 41:20 powerful is uh scale but on the other side many people complain about is 41:27 actually hard to maintain uh hard to get started uh we know that that's why we 41:33 build a lot of u helpers we do have a m light uh which is a uh invalid version 41:39 so people could just install it using a piping saw and running it on your laptop 41:44 uh we do offer um different kind of uh uh deployment I 41:50 think as a open-source uh product we we we are kind like uh very generous to 41:56 offer all kinds of like for example the kubernetes operator even you can do helm you can do uh docker install so uh in 42:04 the 26 we also have support and yam install so it makes uh people life even 42:11 like even easier if you just want a single machine uh solution you don't really need to install docker anymore 42:17 yeah and beside that for any of the um distribute distributed user uh one of 42:24 the key uh argument is that people saying that MUS is very happy because 42:30 you introduced Kafka or POA as a redhead locking uh the reason we do that is by 42:36 that time like four five years ago uh we we we decided to do compute and storage 42:43 disagregation but uh there's and on the other side we also want to uh make sure 42:48 that this database is good for your data freshness it's definitely not a batch database it has to be handle all the 42:55 CRUDs uh in real time uh that's why we pick Kafka and POSA as our redhead 43:00 logging so after so many years we see great imp uh improvement on the uh 43:07 storage layers s3 becomes even cheaper and uh much faster and you also have the 43:13 uh single a uh S3 sol S3 solution so the latency is actually 10 times faster than 43:19 the uh standard version so we thought it's a right timing we just introduced a new another layer to uh replace Kafka 43:26 and Pula yeah we also did a bunch of u uh optimize on reduce the component 43:32 number for example we merge index node and data node into one node uh we also merge all the uh coordinators uh root 43:39 root uh query and data coordinators into into one mix card so uh we don't really 43:45 need that uh that much uh micros service so uh people may fear about this yeah 43:53 next bit okay uh one of the most exciting feature 44:00 is uh in in the architecture side is the stream node uh because uh if uh I think 44:08 a lot of people if you get a chance to use MS you you know that uh uh we do have a tunable consistency model so 44:14 people can pick between uh strong consistency to boundary stillness or even image consistency uh the reason we 44:21 do that is that strong consistency implementation it's slow in in in in any 44:27 version be uh before 2.6 because uh when red happens uh every right directly goes 44:33 into Kafka and it can be only consumed uh or can be searched when query node 44:39 actually consume the data so uh it may take some time for uh Kafka to deliver 44:44 all the data to the consumers and it also takes some time for us to notify the uh query node whether you have the 44:51 latest data or not right so that's why we decided to do a major architecture 44:56 change we actually adding a new node called stream node it's actually you can you can think it as a replacement for 45:02 data node so uh all it's fresh inserted actually inserted into the stream node 45:08 and using the stream node uh the stream node is actually responsible for writing the wall uh the the the wall can be 45:13 either a kafka or it can be a wood woodpecker we talk about this later 45:18 right so after the uh data is presented in the in the WL then uh stream node 45:24 actually apply all the data into its memory so all the growing segments is actually moved from um uh kernel to to 45:31 stream node Yeah so by doing that uh the the strong consistent search becomes 45:37 much much faster and also uh we we will introduce some interesting feature for 45:43 example primary key duplication uh for example uh indented right because uh the 45:50 the other complaint about many of the users is that when when when I try to write data into mus and if it reported 45:57 error I I don't know whether I should retry or not if I don't retry I might lost the data but I retry I may have 46:03 duplicate data in in my database so it's it's kind of like dilemma but right now using the stream node we'll just uh 46:10 going to have a indamped cache so this is also part of our feature plan for 3.0 46:15 so if you try you definitely uh see that there there will be only one like entry 46:23 left as long as indefinite ID is the same so uh using the stream node actually solve a lot of uh previous 46:29 challenge yeah yeah so uh other exciting one is the 46:38 woodpacker so it's a disclos uh purely cognative redhead logging service on s3 46:44 and of course it's open source on uh the uh zac ripple uh we actually read an 46:50 engineering blog actually I actually read an engineering blog on the on the um talking about why we need uh the new 46:56 woodpecker solution um but uh I I think for many of our users uh it's just like 47:04 you can you can remove your Kafka dependency if you are not very sensitive to the right latency uh because right 47:10 now we're just trying to write to object storage the latency is actually uh gone to higher so uh you should see typically 47:18 200 to 300 millcond right latency if you're using woodpecker uh but under a lot of AI applications I don't think 47:24 it's a big deal especially if you do batching uh but the good part is the throughput is actually much larger 47:30 compared to pulsa and um also it's uh you you actually remove one dependency 47:37 so it actually save you a lot of cost especially under smaller clusters yeah 47:44 okay so I guess that's pretty much for uh 3.0 47:50 so oh sorry 2.6 uh I I'm so exciting that 3.0 is actually uh in the plan 47:57 probably going to happen in the next uh two or three months so stay tuned uh we do have this vector lake is which is a 48:03 big thing that's why we we actually have this special 3.0 uh version yeah but but but before we wrap up uh maybe we can 48:11 quickly go through one use case which is uh read AI we we are very exciting to 48:16 see them actually using MU as a as a backbone to do their semantic search and try to solve the problems of your 48:24 integrate data uh silo so you have a lot of meeting nodes you have chats you have 48:29 emails you have a lot of information your CRM uh it's very hard to um like 48:36 pull all the data together and help you to understand your business so uh our friend really they're trying to build a 48:42 agent among all those datas uh and of course vector vector is actually uh one 48:48 of the important component in their solution so they start from fast but it's actually lack of the multi tendency 48:55 support we tried a couple different solutions including pine cone and they cannot 49:00 handle all the uh filterings and uh all uh all cost is also a big problem 49:05 latency uh is much slower so that's why they come to us uh after after using uh 49:12 MOS we see like five times speed up and uh especially under multi-tenency use 49:17 case they'll be able to support more tenants and uh uh the migrate is 49:22 actually very very smooth and um they don't I I don't think they even spend too much time on uh migrate all the 49:29 datas so uh if you have the same trouble on filtering on on the uh how to handle 49:37 multi tendency you should definitely think about this and you also help them to reduce the cost by using the DC index 49:43 yep i think that's pretty much what we want to share some questions 49:50 questions indicate the open search and P direction might not be good with 49:55 performance and scale uh is that for certain use case i I would say it's in 50:00 general uh first off PG vector I think is a still a single machine solution so and they what they trying to do is 50:06 actually build a huge index uh on top of all your datas so we have 1 million two million datas that is good but have 10 50:13 million datas first off building the index itself is going to be very very expensive and uh also there's no way to 50:21 scale like one 10 million might be work but after that it's definitely not going to be work and as far as I know uh page 50:28 vector store most of the data in memory so like if if you have u huge amounts 50:33 that that will also be a big challenge for uh for open search uh first off they 50:39 can scale a little bit uh the good part is uh hybrid search this uh feature is uh pretty strong and u uh if if you if 50:49 you do uh rag and you prefer to do full text search I would say open search is still a very good solution uh challenge 50:56 here is first of all open source is written in Java so they can not fully utilize all the SIMD instructions to to 51:02 fully utilize your hardware resource and also the memory uh management for open 51:08 search is really bad so most of the users can only use like 40 50 percentage 51:14 of their uh memories in production environment but if you're using mules you can at least using 70 80 yeah and 51:21 also index building uh for both pdac and open search one their challenge is to try to build index uh and serving 51:27 queries in the same node so what could happens if you ingest all the data you you will see your performance quickly 51:33 job uh because index building takes a lot of CPUs and it actually uh like uh 51:40 uh slow slow down your queries yeah 51:45 uh the duplication has been around a long time at the storage level how will 51:50 this be add more value okay so our duplication is not simple uh block level 51:56 duplications where we're just trying to uh like search for hash value uh and 52:02 compare between like whether two texts has exactly the same so what we do is 52:07 actually a similarity duplication so even for for for many of the materials 52:13 on the on the web page uh or on the internet they might be almost the same like 99% of them are the same but they 52:20 just change for example the the uh author's name or for example just have a 52:27 some kind like special slogans or logos so it's pretty much similar but not 52:33 ideally the exact same right so uh the easiest way to do duplication is just to calculate the hash value and compare but 52:40 uh in that way it's like uh ju just like the keyword search it's uh uh they don't understand the semantic but using our 52:47 main hash uh you can actually uh find the most similar document because for 52:54 training models you don't really need to like keep giving them the same materials and you want your data to be really 53:00 really high quality uh that's why uh they use macro database to do that 53:10 if you have any other questions we'll stay on the line for a few minutes 53:20 yeah so feel free to uh see our release 53:25 blog which is uh written uh by Chris it's uh it's a it's a it's a pretty 53:32 fancy you should see a lot of details and welcome to have our release note um 53:38 uh today and uh yeah don't for there are a lot of information we we do we do 53:44 build a AI bot on both the open source mil and our fully managed mil 53:51 so if you any further questions about uh 2.6 six are about general millers uh go 53:58 to the website talk to the bot if it's not working there is actually a contact 54:03 us button so you can directly contact us 54:08 cool so if there's no more questions then uh we'll end the webinar here uh clean up the video fix the audio sorry 54:16 about that we got a couple of new mics here and then um yeah if you want to chat with James uh just sign up for one 54:23 of the uh uh office hours and uh he'll be happy to go into a lot more details with you one-on-one or one of the other 54:29 engineers can as well so thanks everybody for joining us today and um can't wait to hear what you build with 54:37 Milvis bye see you

Meet the Speaker

Join the session for live Q&A with the speaker

James Luan
VP of Engineering at Zilliz
James Luan is the VP of Engineering at Zilliz. With a master's degree in computer engineering from Cornell University, he has extensive experience as a Database Engineer at Oracle, Hedvig, and Alibaba Cloud. James played a crucial role in developing HBase, Alibaba Cloud's open-source database, and Lindorm, a self-developed NoSQL database. He is also a respected member of the Technical Advisory Committee of LF AI & Data Foundation, contributing his expertise to shaping the future of AI and data technologies.

Introducing Milvus 2.6: Scalable AI at Lower Costs with James Luan, VP of Engineering at Zilliz

About this webinar

You'll learn

Meet the Speaker

AI Assistant