Introducing Milvus 2.6: Scalable AI at Lower Costs

You’re in!

Webinar

Introducing Milvus 2.6: Scalable AI at Lower Costs with James Luan, VP of Engineering at Zilliz

Resources

Slides | Introducing Milvus 2.6: Affordable Vector Search at Billion Scale | GitHub repo

Transcript

my name is Chris Churilo and um joining me today is the world famous 0:09 James if you've been playing around with the Milvis project then you know uh James is 0:15 the VP of engineering at uh Zillis and uh the key maintainer of Milvis and um 0:23 James is James is like incredibly passionate about Milvis 26 so if this is the first time that you've heard him 0:29 speak you are in for a treat um and so let's just go ahead and get started so 0:35 let me turn on slideshow and just give everybody an introduction to uh MilVIS 2.6 so um of course you 0:45 know we are a open-source vector database some of you might have now by now everyone's heard of us that's on 0:50 this uh webinar and we're really excited about the number of contributors and also the uh community engagement so once 0:58 again I can't thank you all enough um it's really been a pleasure working with everybody so thank you again 1:05 um you know one of the things that we like to stress to our community is that 1:11 we are not trying to be everything to everybody uh we are not trying you know we know 1:17 better that we should stay in our lane and we really feel very strongly that where we really um shine is where we 1:24 focus on which is making sure that we've built a vector database from the ground up it's not pegged onto some other tool 1:32 it's fully open source Apache 2 license we know it's important because it's important to us when we look at um 1:37 different projects you guys also find it very important so you can actually look under the hood and also we get really 1:44 incredible contributions that is visible to everybody uh and then thirdly as I 1:50 mentioned we want to stay in our lane so what we like to focus on is a fully distributed system that can really 1:55 perform at scale and we're not going to be you know this tool that you can just 2:00 like very quickly get up and running to you know prototype on although you can with you know MILV if you have a little 2:07 more patience but really we are trying to make sure that we can help your multi-tenented 2:13 um you know very performant very uh sophisticated application can really work well with vector search that's what 2:20 really focused on so when you look at the feature set that uh James is going to describe for 26 you'll see we are 2:25 continuing to stay in our lane but there's a couple other themes that I want to make sure that um we uh share 2:33 with y'all because I think you will appreciate that um you know as engineers we don't get a blank check from finance 2:40 they don't just say go ahead Chris spend as much money as you want on any infrastructure we get it we know that uh 2:47 we always have to make sure that we can provide a solution that's going to be as cost-effective as possible and the way 2:53 that we look at it is that um one way we can do that is by lowering infrastructure costs another is is that 3:00 we can build tools so that you don't have to so we can help boost your uh productivity the other thing is um where 3:07 we have dependencies that are kind of difficult or kind of a hassle we want to remove them as much as possible once 3:13 again boosting your productivity but it also lowers infrastructure costs and just lowers complexity overall 3:20 uh because we know that's really important and then also you know you've come to use things like object store you 3:27 already understand tiered storage we want to make sure that whatever you have in place already that we want to fully 3:33 take advantage of so these are kind of the themes that we have in 26 and um and 3:40 it's it's only going to get better from you know after 26 because our goal is to 3:46 make sure that we can help you make sense of all of your unstructured data but in order to do that we have a really 3:54 big internal goal that we need to help you to reduce your costs even more we need to make MILV a lot cheaper to be 4:00 able to uh use in your uh in your infrastructure and we know that we still 4:06 have a long ways to go so we appreciate your patience as we you know build out these capabilities to support you uh and 4:13 we hope that you can appreciate the the this goal that we have it might seem counterintuitive you know coming from a 4:19 product vendor but uh like I said we we know we don't get a blank check u we 4:24 know you don't either so uh we feel that we're kind of all in this together all right so I'm going to hand the mic over 4:31 to James and we'll start going really deep into what these capabilities are 4:37 all right nice yeah uh thanks Chris i I I think the uh introduction uh is 4:43 actually pretty good so the goal for the milics is just try to uh reduce the cost 4:48 for everyone so uh we can see more use cases yeah so uh that uh we I'm actually 4:56 pretty excited about the uh all the new features for uh 2.6 we spend around like half a year uh you this is the first 5:03 time I talk about 26 because I I was like pretty busy about make making all those great features happen but uh 5:10 finally it's there yeah so uh the first part I want to talk about is uh uh how 5:17 we can leverage all the uh cloud for us and make make vector search even cheaper 5:24 yeah so uh first feature uh our our uh signature feature is actually tire 5:31 storage so you can separate all your hot and code like data uh if if people like 5:37 when people making all those AI applications one of the challenges here is like multiency uh you actually build 5:45 apps for million or like tens of million different users and only small part of 5:50 them is actually uh active users uh similar thing happens even if you build 5:55 enterprise applications you may have datas like for the recent three months 6:00 it's going to hot and for most the other want to search on it but it's like super 6:07 cold and don't you definitely don't want to like load all the data into into uh memory or disk u yeah so uh that that's 6:15 why we build this smart cache uh the architecture the the architect seems to be a little bit small but uh uh just 6:22 take it as another layer of caching on top of object storage yeah so uh as as 6:29 we may all know that uh like uh like start from 2.0 zero like M was actually 6:34 using all the object storage as a persistent layer but the original way do that is actually loading all the data 6:41 into uh main memory or disk and try to serving it uh at a faster speed right so 6:47 we uh for the for the last couple years we observed a lot of people they try to loading releasing all the collections 6:53 managed by themsel uh when people uh logging they they loading all their datas and when people like uh leave 7:00 their applications they release it it's actually a lot of work to do and the performance in is not ideal so we 7:05 thought why not we do it by ourselves to build another like a caching layer on 7:12 top of all object storage so people don't need to worry about it so uh with 7:17 with the ter storage uh you could say that uh the hot data performance is 7:23 still similar to H&W or DSN uh you actually using right now based on either 7:29 the caching is on disk or it's on it's on it's in memory uh but if you if once 7:35 you eviction all the data uh onto the object storage you should see the latency to be uh higher maybe one to two 7:42 seconds but on the other side like for many of the other codeas like latency is not very important but with the new ter 7:49 storage implementation you you will see that your uh cost of storage actually drop to like another five to 10 times 7:56 yeah so uh second second feature I I 8:01 want to mention is actually uh rapid Q so it's actually a quantition uh 8:07 algorithm uh which which after a lot of evaluation we thought it's actually the sa all all 8:14 the like contition algorithm we see so uh I know some of you guys maybe heard 8:19 about binary contradition so they just try to convert a 32-bit float into a one bit so they can compress all the vectors 8:26 32 times right but but the cost for binary contition is that you lose a lot of recall and uh for people who build 8:34 rag applications you know that uh accuracy is actually very very important especially for top three top five 8:39 accuracies right so uh that's uh that that's why we introduce the uh the new 8:46 contition uh we'll we'll talk about this in the in the in the next slides uh for a little bit details but just remember 8:53 uh for any of the open source users you got a a new option and it can be worked together with both IVF index and W index 9:01 yeah so uh third thing we introduced is uh is intate vector support as well as 9:07 its hw index uh we saw some model vendors they try to do quantation by 9:13 themselves for example uh cohhere their model can generate both uh float 32 9:19 vectors in8 and even binary vectors uh m already support binary and float 32 and 9:25 now we finally support in8 so uh with with the new uh intended implementation 9:31 the recall is like one or two% lower than full 32 but you get the chance to 9:37 like optimize four times your memory yeah so last thing is uh M storage 2 uh 9:45 we do we do fully leverage uh parkrete and arrow uh at the very beginning of 9:52 the design of M uh uh 2.0 uh after a 9:57 period we find that the the the challenge is that it's very hard to store all the object storage data and 10:03 all the vector datas into into a park format and also park format has a lot of limitations on how you can do uh very 10:11 fast point queries on object storage that's why we that's why we redesigned our uh storage uh layer and we'll also 10:21 cover this in the next size Yeah 10:26 okay so uh the first thing I want to mention is the rabbit Q uh first of all 10:32 it's a banner consition method uh the the key idea for rabbit q is that think 10:39 about this for a three-dimensional uh like datas uh range between uh minus1 to 10:46 to one it can be chosen if if it's a random vector then it can be chosen any 10:51 point uh between minus one and one right uh but if you normalization your data 10:58 with a very high dimensional uh space you see that uh all the data will be 11:04 located in a very smaller range uh very close to zero because when when 11:10 dimension gets high then for each of the dimension uh it it gets some kind of certainty 11:16 So so so uh it it it it seems to be a little bit uh hard to understand but if 11:22 you guys are interested you can you can just uh go to read the paper is actually one of the best paper on sigma 2024 11:29 uh and uh you by by leverage these features uh we we it's actually some 11:37 kind of stats so uh we can actually uh reduce the uh 11:44 link to the paper yeah we we'll show that later yeah you can you you can actually uh reduce the uh average 11:50 average arrow between your quantized vector and your original vectors yeah so 11:56 uh using rabbitq you see high search accuracy and and the other best part for 12:02 this is it's fully hardware friendly so you can optimize it with uh sim instructions and it can it can be 12:09 combined with any kind of index especially for fresh scan uh they have a the other paper talks about how raviq 12:15 can be worked together with the fast scan for better performance so we'll we'll definitely share all those two 12:21 papers yeah but uh from our uh like result uh if just to think about one bit 12:27 without without any refine or without any uh reranking it seems to be uh three 12:34 or four percentage higher on the accuracy compared to the traditional product content 12:40 is 10 times yeah if you compare with S scatter quantation could be even like 10% um like higher u uh like uh recall 12:49 higher and also the uh QPS is kind of like similar with the scatter conditions 12:55 so uh leverage the new composition algorithm uh we actually implement uh 13:00 rebq and also the uh uh and and also the 13:06 hnw rabbitq features so uh using this it's actually uh under the sim recall is 13:12 actually uh double the QPS and it also can uh save you some memory compared to 13:17 the like uh original IVF index yeah next so you just have a little question about 13:23 high accuracy okay so how do we define hierarchy uh 13:30 that that's a good question so usually we use two things to evaluate accuracy uh one is recall uh which is saying if 13:38 people are asking for top 100 result how many results is actually get uh uh 13:43 retrieved uh after doing a nearest neighbor search so if you're using brute force search then you recall it's 13:50 definitely 100% uh accurate you get all the 100 result but using any kind of nearest neighbor uh algorithm you you 13:57 you should see you you might missing some of the result so ideally we want to see like 98 99% of your recall but uh 14:04 once you do heavy uh like if you're using H&W index or discount index you can definitely get 90 98 99 but if 14:11 you're using some index like IVF index you might get 95 and if you do 14:17 quantization you should see your recall keep getting lower and for for example the naive battery quantition you should 14:24 see recall under 75 or 80% so which means you if you search for top key 100 14:31 you get like 75 80 of them yeah so the other one is using NDCG so the major 14:38 difference between NDCG and the and the recall is that it's it's not only evaluate how many result I can get it 14:44 also uh evaluates it ranking so the higher the ranking for example uh the 14:49 the first result has the most weight uh like is most weighted so uh it if if you 14:56 get the first first uh uh if it should be the closest one but it actually shows 15:01 in your search result to be the like 20th it's still good because you get the result but it's not ideal right you want 15:07 the the search result to be show on the on the first position as well so that isn't easy it it considers all the uh 15:14 recall but also the uh ranking yeah okay 15:19 yeah so so the second exciting feature is actually storage v2 so it's it's 15:25 actually working together with the ter storage because uh originally what we do is just loading all the data from uh 15:31 object storage and caching it so we don't really care about the performance of our storage format yeah but uh uh in 15:38 the in in the new release with the uh ter storage and with the next release we when we have the vector lake solution so 15:46 the storage uh the storage on the object storage becomes very very important uh many of you might heard about uh the 15:53 lens format which is actually designed for vector lake uh we kind of like uh 15:59 implement the same thing but using fully partrrete files so it actually compatible with all your stacks so you 16:06 don't really need to introduce a new format into into your big data or into your like uh AI inference yeah so uh the 16:15 uh the goal for start B2 is actually uh two thing one is that we want to introduce more data types especially on 16:22 datas for example long long tags for example the blob storage which can help you to store all the images and v uh 16:29 audios into a vector database so uh so those those datas is actually going to 16:35 be huge so you cannot directly store them uh those datas into park because it 16:40 actually breaken your row group uh think about this you have one field with four 16:47 bite uh in each row and the other like uh field with four kilobytes then what 16:54 could happens is because park is trying to store all the data into same row group right so then what happens is uh 17:00 if you tune the ro to be really really large then uh you have huge amount of 17:06 datas on on the on the smaller fuse yeah and uh if you want to retrieve one of the fields is it's is it's going to be 17:11 very expensive because I have to retrieve the whole row group but on the other side if I tune the row group to be smaller then uh for the for the for the 17:20 large field they may only have like 10 10 or like 20 rows in one row group and 17:27 uh then the the the filtering speed or the sequential rate speed could be low so it's kind of like a dilemma yeah so 17:33 the way we handle that is actually we split large and smaller field into different row groups or different files 17:38 yeah and we put all the vectors stored outside of parkit because uh the the 17:44 initial version the start one version we actually store uh all the vectors in a park array and we found that the 17:51 serialization serialization is actually very costly so that's why we move also 17:56 move the vector out of the percrete it's it's not designed for vector at all yeah we also fully leverage the percrete page 18:02 sets to accelerate plant queries so it seem to be pretty stable uh battle 18:08 tested and uh from our test we see that um uh compared to start V2 and V1 uh we 18:14 actually got five uh 50 times acceleration on pawn queries so uh that is very impressive on the uh uh tire 18:23 storage use case yeah next slides 18:28 okay so uh we we care we care not only about cost for sure yeah during the last 18:35 couple years we we've been working with AI developers uh try to build something they can like all the tools so uh they 18:42 can easily depends on uh for example the hybrid search features uh at the very beginning we thought you should just use 18:49 two system like using elastic and muis uh so uh you can you can manage both and 18:55 do a hybrid search uh using your own logic and people complains a lot about 19:01 it should be simplified there's no need to maintain two two different clusters uh that's why in the last mill release 19:08 we actually support hybrid search and also fut search right so similar things happens here as well so the one of the 19:15 most exciting feature we want to uh talk about is that out yeah so a lot of 19:20 people uh is keeping keeping asking us why cannot just uh vector is working 19:26 together with my embedding models so I don't really need to call the API twice so usually what what's going on here is 19:31 people call open and probably call Ginai embeddings uh get the embedding and try 19:37 to concate their scalar fields with their embeddings and then ingest into 19:43 muis yeah so um but it's not only about embeddings there are a lot of 19:48 pre-processing sometimes you need also need post-processing for example you need re-ranking you need highlighting or 19:54 your datas uh and we thought it's a it's a we we we need to integrate this but we 19:59 we also need to offer some kind of flexibility to uh users so they can design their own uh pipeline to 20:06 processing their datas so in the in the 26 uh we actually uh introduced an uh 20:13 another uh functionality which we call it a pre prep-processing pipeline post-processing pipeline so using 20:18 pre-processing pipeline the the the easiest usage is you can directly call all the embedding vendors uh using m or 20:27 you can we can we can even work together with uh inference engines such as wheel 20:32 or uh hugging face tax inference engines so you don't really need to host on that 20:37 and on also on the open source uh mu uh 26 our kubernetes operator already 20:43 support to bring up a whm uh inference engine so we we we actually not helping 20:49 to call those uh embedding model in inference but we also help you to deploy 20:54 uh those inference engines but you can also like use uh for example open or cohhere so you don't really need to like 21:01 manage all those stuffs yeah uh post-processing is actually doing the same thing uh we actually the initial 21:07 step is we implemented re-ranking um model so you can uh calling uh cohhere 21:14 or calling Google APIs to just get a reranking uh but in the next couple of 21:20 release we also support more complicated type for example highlighting I I know a 21:26 lot of people uh when when they do search they want to show their customers and explain why uh certain results is 21:33 actually picked from the search right So that that's why we're working on highlighting the the one of the 21:38 differences between our highlighting and the uh like traditional search engine is we do semantic highlighting so it not 21:46 only shows uh the keywords but it also shows the semantic relations some 21:52 sometimes uh you you you uh like you you can you can if you're using density many 21:58 search you can actually get a result without even any of the keywords but it's still semantically related so uh 22:05 using our latest highlighting this this uh this is not there yet but should be 22:10 come out in the next one or two months so using the semantic highlighting it's easier for all the rack developers to 22:17 explain why this is actually searched yeah so uh we also implement the data 22:24 model so the struct list or embedding this is um it's something really cool and I'll explain this in the in the next 22:30 size uh we do improve a lot of search functionalities freeze match helping you 22:36 to uh match a phrase so so 2.5 we already support keyword match so you can 22:42 uh match a couple of keywords but freeze match is actually helping you to to to to uh match a keyword uh a phrase for 22:51 example vector database instead of match vector and database and you can also care about their uh locations so you 22:58 just search vector and database uh but not database vector yeah uh we do have a 23:04 multil language tokenizer uh one of the uh tokeniz that we introduced is Linder 23:10 which is pretty good for uh Korean and also Japanese and the the the best thing 23:17 for that is we as we also wrap a multi- language tokenizer because as we see as 23:22 as the as the word becomes more global uh when people build applications especially AI applications uh you are 23:30 actually a day one global application you serve people from multiple part of the world not 23:36 US not only Europe yeah so using the multi uh language tokenizer we actually 23:41 be able to support more than 60 different language and u as you can specify which language uh user are using 23:50 then uh we actually split the the BM25 for different tokeniz uh like language 23:56 into different stats so you don't really need to worry about like different uh 24:01 language users that actually have some uh uh like influence on each other when 24:07 when when when when we create the stats yeah so uh we we we also create a a 24:14 bunch of like interesting uh tools for developers to make their develop uh 24:20 development moves faster uh one of the cool feature we add is add field because 24:25 right now uh the schema for mu is actually pretty stable so uh if if you 24:30 said I have three different field one is ID field the other one is a vector field maybe the other one is your text field 24:37 and someday you you said I I might need to add some text or some futures on 24:42 another field but there's no way you can you can you can back fill all the data and add another field so you have to 24:47 copy all the data doing all the ATLs which is very expensive especially you have large amount datas so right now you 24:53 have the option to add another field uh without uh break the traffic yeah we 25:00 have a sampling feature so sometimes you just need to uh have a taste of what's 25:05 actually in your uh collection or in your data but you can adjust the scaling on that right now we we actually offer a 25:12 query sampling feature so you can sampling some of the data in in your data set and using this query sampling 25:18 feature we uh you can also uh search and evaluate your recall so that is also a a 25:26 new feature offered on this card so because a lot of people they uh they don't have a ground truth they don't 25:32 have a real word data set to evaluate how well their uh their search or their 25:37 accuracy is but using the cr sampling data plus the recimation you'll be easily able to understand how well your 25:44 your your your query actually do in in production environment yeah so time 25:50 aware decay function is a reranking algorithm uh working for agent builders because we 25:57 know that agent memory sometimes uh we we want to prioritize uh fresh data 26:02 instead of like old datas uh that is similar as a human because uh we forget 26:08 a lot of uh old things right so so uh in the in the uh new uh newest 26 we 26:14 actually introduce this uh decay function so we have multiple different kind of decays but uh the general idea 26:21 is the newest data the uh higher relevancy or the higher priority it will 26:27 have and show on your search result yeah so uh last is a is a improvement of our 26:33 time to leave feature uh we actually introduced time to leave uh long time ago like two three years ago uh but 26:39 there are some major limitations we we we we keeps uh hearing from uh heard 26:46 from uh users that even if time is actually uh like uh already expired is 26:52 actually not actively uh clean and it can still show in the search result so 26:58 uh in the in the in the newest two uh 26 uh we actually improved the TTL feature 27:03 so uh for all the data is actually expired it would it it will directly uh 27:09 removed from your search result and we also have a uh limitation comp uh like a 27:15 deadline compaction strategy so for any time uh for any data which is spared for more than one days we just do compaction 27:22 and clean all the datas so you have a guarantee to clean all your datas yep 27:29 okay so uh one of the cool features for uh 26 is for rack users is the struct 27:37 list yeah because originally uh we we have this table model and each of the 27:43 primary key is unique so when when you when you build a uh rag applications especially for like ch uh talking with 27:50 your pdf apps or web page uh what you always do is uh for each of the row or 27:56 each of the intake inside mu it's actually one one chunk it's it's not one 28:03 document right so uh the problem for this is sometimes when we do search uh 28:08 we want to search the most like top 10 similar documents inside of a similar 28:13 chunk uh that's why we introduced a feature called group by in 2.4 and so so 28:19 you can you can search and group by with uh uh document ID equals to something so 28:24 it just to show you top 10 most relevant documents inside of chunk uh but but we 28:30 saw this is not really uh really good enough uh couple of reasons for that 28:35 first of all sometimes when we do search we want all uh chunks uh uh inside each 28:42 document to be colllocated together so we can do some actual uh either it's 28:48 reranking or some actual computation on top of it one of the best example is to using multivactor or cobbertt embedding 28:55 models so uh if you're using cobbertt for each of the document they are just to generate multiple embeddings and when 29:02 you do search uh you actually need to do a maximum uh distance so when you do maximum you you you want to make sure 29:08 that all the uh vectors inside one document is actually collocate together yeah so so that's why we introduce a new 29:14 data model uh you still have a primary key and the primary key is the document ID and inside each row you can have a 29:21 list of embeddings so if you do chunking let's say if you split the document into 29:27 10 chunks uh you just uh put the 10 chunks into one tensor or into one list 29:33 of embeddings instead of put into 10 rows and based on that you can do a lot of uh uh reranking implementations you 29:40 can we also see a use case where people want to do time series on their uh vector embeddings uh that is a video use 29:47 case right you you actually do cutting all the videos into frames and for each of the frames they actually have uh 29:54 context informations and they have time series relations so so we we actually introduce a special matrix so uh the 30:02 embedding distance calculation can actually take taking care of the uh temperature relations yeah and the the 30:08 best part is not only about embedding list it's also a strct list so which means uh you can you can design schema 30:16 which you have a title embeddings you have some tags in the in the titles but 30:22 for your content you have a new strct the content is actually split into like 30:28 50 different chunks and for each of the chunks you can also have some uh other meta metadata field uh and then you can 30:35 you can you can filter on the on the metadata field as well so so the the the data model itself becomes more and more 30:42 complicated uh and it's actually we we're just trying to catering all the all the need for the rack developers and 30:49 that's what we believe why this uh uh vector database need need to be there if 30:54 if you want to add the similar thing in a pg vector or elastic user is very very 31:00 hard and the way we do that is actually we start from the index itself and then try to u modify the uh mules the the the 31:10 database layer pretty much similar to what they do uh only that's taking in the last 10 years many of the 31:16 innovations happened on the lucine side you need to know a lot of details about your uh search engine then uh on the top 31:24 layer you build elastic trying to leverage all those features into a more scalable way 31:31 yeah so there's a question how does this compare to open search and p vector uh I 31:38 think I will compare it in three way first if you have one million datas and you just doing naive rex 31:43 uh please use PG vector because this is the easiest way you can do vector search 31:49 it's not good at performance not good at skill uh we are good at it but it's it's 31:54 it's kind of like based on your use case so uh second difference I I I would see is that uh uh we did a lot of uh 32:02 optimization on the uh how how we can work with AI applications for example 32:07 the h research for example the new data models so it's easier for all the developers ers to faking their uh 32:14 applications into the new data model yeah just just like MongoDB when you do 32:19 like uh um mobile applications uh at the very beginning nobody want to store their data in a relational database they 32:26 just want to store their JSON in a in a super easy uh like document stores 32:33 similar things happens here when you have vectors you might have different kinds of vectors not only like very 32:38 simple dense embeddings you also have sparse embeddings have vector less multi vectors binary vectors and you you won't 32:45 find a database that can easily store everything into it yeah next page 32:52 something hang 32:57 sorry about that 33:06 yeah so maybe a couple of um more comments on the uh open search so we 33:12 today we do release our vector DB bench 1.0 so uh yeah it's actually listed here 33:19 so it com it it compared uh MU series with um many of our uh the uh 33:26 competitors or the players in the market um it's for sure it's open source so uh 33:32 feel free to evaluate by yourself so the the the the interesting part is uh it it kind of become a standard because every 33:39 other uh vendors who want to support uh vector search actually contribute to vector DB bench and for their own 33:46 connectors so it's uh also a lot of fun for us to to providing different kind of 33:51 systems and see what they what it how it does yeah so I I think third part I I 33:57 would say is um performance yeah uh as 34:03 as you may know MUS is actually known for its uh scalability and performance 34:09 for for for quite well we do have a couple of large customers probably the largest customers uh in the in the 34:16 planet yeah um one of the 34:21 most exciting I I would say this is probably the most exciting feature uh I I see in 2.6 is uh JSON shredding and 34:30 JSON index so uh in 2.2 we actually introduce a dynamic field uh so it it it 34:37 makes your data model more flexible but on the other side it makes your 34:43 filtering super slow that's that that's how we see in the uh last one year yeah 34:48 so uh after a lot of evaluation we actually uh leverage the JSON shredding 34:55 policy so it's kind of like very similar to what click house does um we'll 35:00 explain a little bit about what is JSON shredding but the results seem to be very very impressive we see 100 times 35:07 performance improvement on uh uh like dynamic fieldings so which means uh after 2.6 loads out 35:18 you probably don't really need to think about your schema as serious as before 35:23 2.6 because even if you make mistakes don't worry about this you can still using dynamic schemas because the the 35:30 the fusion space is almost similar compared to the uh like a fixed schema 35:36 datas so that could be a big difference uh engram index is actually using for uh 35:42 regular expressions and uh uh like uh we know that mu is not built for that but 35:49 uh uh when a lot of uh regular cases we still see people trying to using text 35:54 match or uh like to to uh match some of their keywords so uh we we we we build 36:02 this in index so the the the uh uh so the design like we split all the 36:08 data into smaller uh tokens for example if you do two gram then we split all your uh data into a group of two and uh 36:17 we build index on on the uh each two characters and when search happens we 36:23 just uh for example you search for hello then we just split hello into u uh group 36:29 of two so you search for whether H is in is is is in your tax whether EL is in 36:36 your tag whether LL is in your tag and L is in your T if all those uh two G is in 36:42 your text then which means this uh uh this tax is actually a match so it's 36:48 it's you can do two G three G or even like four G so the larger the granularity the lesser resources but uh 36:56 it also have some limitations because if you do for gram then you cannot search uh for uh like token with three three 37:03 characters yeah uh so this help to accelerate all the text match and uh 37:09 regular expressions so mini hash uh is a locality sensitive hash index uh built 37:18 for huge amount of data uh duplications uh we're working with uh with one of the 37:24 world largest uh larger model uh training uh companies so they have 10 37:30 billion level datas uh called from all the all the web page and uh the first thing after they call all the data out 37:37 is they want to do a dubrication so uh we actually create this uh uh algorithm 37:43 which called mi hash and build a index on top of it so it it can helping people to do large scale duplications uh 37:51 currently the the the uh production environment we run is actually about uh 37:56 10 billion uh vectors and there's a lot of optimization to do for anyone who has 38:02 that uh large amount of data we definitely h happy to help so come to us 38:08 and uh maybe not this use case but uh we we will definitely uh glad to help on on 38:14 like optimize on that yeah so async pime's client is uh is a lot of people 38:20 asking for that so uh it has well implementated implemented and also 38:26 integrated with uh linex and uh uh launching so uh for anyone who is 38:34 interested in using async uh feel to give us a feedback yeah so last thing is a way tob bench so uh I already talked 38:41 about this so uh just go go to GitHub and search for vector database uh vector 38:47 database bench uh the first one should be uh vb bench under this tag just try it and we we get a fancy uh UI so so so 38:56 you can see the all the result and it's very easy to deploy just with one single docker 39:02 yeah yeah so uh just give a very general idea 39:07 what what is a JSON threading so on the uh left side is uh is a is is a JSON 39:13 field we have we have a couple of different paths one is A uh the other one is B so the special part for B is 39:20 that uh not all the data in B is actually under the same data type some of them are strings the other ones are 39:27 actually integers and we also have D E and uh F so uh they probably just show 39:35 in some of your um like uh entities definitely not all your or your rows 39:41 right so we actually split all the data into three different three different type of case uh if a is shows in all of 39:50 your data or most of your data then we call it as a tapped case so we directly uh directly store it in one column 39:56 storage uh for B and uh uh for example they have both strings and integers we 40:03 actually split into two different columns to store them one one of them is 40:08 a string column then other one is a int column and for the columns which don't have this data just using null yeah so 40:16 sim uh similar thing happens um on d and u you have all all the strings and for 40:21 other uh like entities which which don't have this field you can just uh fill in with null and um for the rest of your 40:30 datas um then you can you can you can store it with a bon format and we 40:35 actually create u bon stats on on top of the uh format so it's it's kind of like 40:41 a inverted index so search on it should be much faster because uh all the data only shows in part of your uh JSON or or 40:50 part of your entities so uh by creating an inver index is actually accelerate the filtering speed so with with with 40:56 all that we we actually seeing um accelerating between 10 times to 300 41:02 times and um dynamic filtering seems to be always done in less than 10 millconds 41:08 yep okay so uh last part we want to mention 41:14 is uh how to reduce the maintenance cost uh we all know that Mis is actually 41:20 powerful is uh scale but on the other side many people complain about is 41:27 actually hard to maintain uh hard to get started uh we know that that's why we 41:33 build a lot of u helpers we do have a m light uh which is a uh invalid version 41:39 so people could just install it using a piping saw and running it on your laptop 41:44 uh we do offer um different kind of uh uh deployment I 41:50 think as a open-source uh product we we we are kind like uh very generous to 41:56 offer all kinds of like for example the kubernetes operator even you can do helm you can do uh docker install so uh in 42:04 the 26 we also have support and yam install so it makes uh people life even 42:11 like even easier if you just want a single machine uh solution you don't really need to install docker anymore 42:17 yeah and beside that for any of the um distribute distributed user uh one of 42:24 the key uh argument is that people saying that MUS is very happy because 42:30 you introduced Kafka or POA as a redhead locking uh the reason we do that is by 42:36 that time like four five years ago uh we we we decided to do compute and storage 42:43 disagregation but uh there's and on the other side we also want to uh make sure 42:48 that this database is good for your data freshness it's definitely not a batch database it has to be handle all the 42:55 CRUDs uh in real time uh that's why we pick Kafka and POSA as our redhead 43:00 logging so after so many years we see great imp uh improvement on the uh 43:07 storage layers s3 becomes even cheaper and uh much faster and you also have the 43:13 uh single a uh S3 sol S3 solution so the latency is actually 10 times faster than 43:19 the uh standard version so we thought it's a right timing we just introduced a new another layer to uh replace Kafka 43:26 and Pula yeah we also did a bunch of u uh optimize on reduce the component 43:32 number for example we merge index node and data node into one node uh we also merge all the uh coordinators uh root 43:39 root uh query and data coordinators into into one mix card so uh we don't really 43:45 need that uh that much uh micros service so uh people may fear about this yeah 43:53 next bit okay uh one of the most exciting feature 44:00 is uh in in the architecture side is the stream node uh because uh if uh I think 44:08 a lot of people if you get a chance to use MS you you know that uh uh we do have a tunable consistency model so 44:14 people can pick between uh strong consistency to boundary stillness or even image consistency uh the reason we 44:21 do that is that strong consistency implementation it's slow in in in in any 44:27 version be uh before 2.6 because uh when red happens uh every right directly goes 44:33 into Kafka and it can be only consumed uh or can be searched when query node 44:39 actually consume the data so uh it may take some time for uh Kafka to deliver 44:44 all the data to the consumers and it also takes some time for us to notify the uh query node whether you have the 44:51 latest data or not right so that's why we decided to do a major architecture 44:56 change we actually adding a new node called stream node it's actually you can you can think it as a replacement for 45:02 data node so uh all it's fresh inserted actually inserted into the stream node 45:08 and using the stream node uh the stream node is actually responsible for writing the wall uh the the the wall can be 45:13 either a kafka or it can be a wood woodpecker we talk about this later 45:18 right so after the uh data is presented in the in the WL then uh stream node 45:24 actually apply all the data into its memory so all the growing segments is actually moved from um uh kernel to to 45:31 stream node Yeah so by doing that uh the the strong consistent search becomes 45:37 much much faster and also uh we we will introduce some interesting feature for 45:43 example primary key duplication uh for example uh indented right because uh the 45:50 the other complaint about many of the users is that when when when I try to write data into mus and if it reported 45:57 error I I don't know whether I should retry or not if I don't retry I might lost the data but I retry I may have 46:03 duplicate data in in my database so it's it's kind of like dilemma but right now using the stream node we'll just uh 46:10 going to have a indamped cache so this is also part of our feature plan for 3.0 46:15 so if you try you definitely uh see that there there will be only one like entry 46:23 left as long as indefinite ID is the same so uh using the stream node actually solve a lot of uh previous 46:29 challenge yeah yeah so uh other exciting one is the 46:38 woodpacker so it's a disclos uh purely cognative redhead logging service on s3 46:44 and of course it's open source on uh the uh zac ripple uh we actually read an 46:50 engineering blog actually I actually read an engineering blog on the on the um talking about why we need uh the new 46:56 woodpecker solution um but uh I I think for many of our users uh it's just like 47:04 you can you can remove your Kafka dependency if you are not very sensitive to the right latency uh because right 47:10 now we're just trying to write to object storage the latency is actually uh gone to higher so uh you should see typically 47:18 200 to 300 millcond right latency if you're using woodpecker uh but under a lot of AI applications I don't think 47:24 it's a big deal especially if you do batching uh but the good part is the throughput is actually much larger 47:30 compared to pulsa and um also it's uh you you actually remove one dependency 47:37 so it actually save you a lot of cost especially under smaller clusters yeah 47:44 okay so I guess that's pretty much for uh 3.0 47:50 so oh sorry 2.6 uh I I'm so exciting that 3.0 is actually uh in the plan 47:57 probably going to happen in the next uh two or three months so stay tuned uh we do have this vector lake is which is a 48:03 big thing that's why we we actually have this special 3.0 uh version yeah but but but before we wrap up uh maybe we can 48:11 quickly go through one use case which is uh read AI we we are very exciting to 48:16 see them actually using MU as a as a backbone to do their semantic search and try to solve the problems of your 48:24 integrate data uh silo so you have a lot of meeting nodes you have chats you have 48:29 emails you have a lot of information your CRM uh it's very hard to um like 48:36 pull all the data together and help you to understand your business so uh our friend really they're trying to build a 48:42 agent among all those datas uh and of course vector vector is actually uh one 48:48 of the important component in their solution so they start from fast but it's actually lack of the multi tendency 48:55 support we tried a couple different solutions including pine cone and they cannot 49:00 handle all the uh filterings and uh all uh all cost is also a big problem 49:05 latency uh is much slower so that's why they come to us uh after after using uh 49:12 MOS we see like five times speed up and uh especially under multi-tenency use 49:17 case they'll be able to support more tenants and uh uh the migrate is 49:22 actually very very smooth and um they don't I I don't think they even spend too much time on uh migrate all the 49:29 datas so uh if you have the same trouble on filtering on on the uh how to handle 49:37 multi tendency you should definitely think about this and you also help them to reduce the cost by using the DC index 49:43 yep i think that's pretty much what we want to share some questions 49:50 questions indicate the open search and P direction might not be good with 49:55 performance and scale uh is that for certain use case i I would say it's in 50:00 general uh first off PG vector I think is a still a single machine solution so and they what they trying to do is 50:06 actually build a huge index uh on top of all your datas so we have 1 million two million datas that is good but have 10 50:13 million datas first off building the index itself is going to be very very expensive and uh also there's no way to 50:21 scale like one 10 million might be work but after that it's definitely not going to be work and as far as I know uh page 50:28 vector store most of the data in memory so like if if you have u huge amounts 50:33 that that will also be a big challenge for uh for open search uh first off they 50:39 can scale a little bit uh the good part is uh hybrid search this uh feature is uh pretty strong and u uh if if you if 50:49 you do uh rag and you prefer to do full text search I would say open search is still a very good solution uh challenge 50:56 here is first of all open source is written in Java so they can not fully utilize all the SIMD instructions to to 51:02 fully utilize your hardware resource and also the memory uh management for open 51:08 search is really bad so most of the users can only use like 40 50 percentage 51:14 of their uh memories in production environment but if you're using mules you can at least using 70 80 yeah and 51:21 also index building uh for both pdac and open search one their challenge is to try to build index uh and serving 51:27 queries in the same node so what could happens if you ingest all the data you you will see your performance quickly 51:33 job uh because index building takes a lot of CPUs and it actually uh like uh 51:40 uh slow slow down your queries yeah 51:45 uh the duplication has been around a long time at the storage level how will 51:50 this be add more value okay so our duplication is not simple uh block level 51:56 duplications where we're just trying to uh like search for hash value uh and 52:02 compare between like whether two texts has exactly the same so what we do is 52:07 actually a similarity duplication so even for for for many of the materials 52:13 on the on the web page uh or on the internet they might be almost the same like 99% of them are the same but they 52:20 just change for example the the uh author's name or for example just have a 52:27 some kind like special slogans or logos so it's pretty much similar but not 52:33 ideally the exact same right so uh the easiest way to do duplication is just to calculate the hash value and compare but 52:40 uh in that way it's like uh ju just like the keyword search it's uh uh they don't understand the semantic but using our 52:47 main hash uh you can actually uh find the most similar document because for 52:54 training models you don't really need to like keep giving them the same materials and you want your data to be really 53:00 really high quality uh that's why uh they use macro database to do that 53:10 if you have any other questions we'll stay on the line for a few minutes 53:20 yeah so feel free to uh see our release 53:25 blog which is uh written uh by Chris it's uh it's a it's a it's a pretty 53:32 fancy you should see a lot of details and welcome to have our release note um 53:38 uh today and uh yeah don't for there are a lot of information we we do we do 53:44 build a AI bot on both the open source mil and our fully managed mil 53:51 so if you any further questions about uh 2.6 six are about general millers uh go 53:58 to the website talk to the bot if it's not working there is actually a contact 54:03 us button so you can directly contact us 54:08 cool so if there's no more questions then uh we'll end the webinar here uh clean up the video fix the audio sorry 54:16 about that we got a couple of new mics here and then um yeah if you want to chat with James uh just sign up for one 54:23 of the uh uh office hours and uh he'll be happy to go into a lot more details with you one-on-one or one of the other 54:29 engineers can as well so thanks everybody for joining us today and um can't wait to hear what you build with 54:37 Milvis bye see you

Introducing Milvus 2.6: Scalable AI at Lower Costs with James Luan, VP of Engineering at Zilliz

Resources

AI Assistant