Events
Unlocking Advanced Search Capabilities with Milvus 2.4: Accelerated GPU Search, Multi-Vector Search, and Beyond

Webinar

Unlocking Advanced Search Capabilities with Milvus 2.4: Accelerated GPU Search, Multi-Vector Search, and Beyond

Zilliz Webinar - Zoom

Join the Webinar

About the Session

Join us for a technical webinar featuring Zilliz’s VP of Engineering, where we'll not only discuss the innovative features of Milvus 2.4 but also demonstrate them in action. This session is designed for developers eager to enhance their search functionalities and master the challenges of unstructured data. Through live demos, we'll show you how to effectively utilize the new features, ensuring you understand both the concept and the practical application.

Key Highlights:

Accelerated GPU Search with Nvidia's CAGRA: Boost search performance with GPU acceleration.
Multi-Vector Search: Execute queries across multiple vector fields within a single collection.
Groupby for Diverse Search Results: Implement grouping strategies to diversify search results in large datasets.
Sparse Vector Support: Integrate sparse vector models like SPLADEv2 for improved information retrieval.

This interactive session will equip you with the knowledge to efficiently search and retrieve information from vast, unstructured datasets, refine search accuracy, and enhance your AI application's performance with Milvus 2.4’s capabilities.

View presentation slides

Transcript

So I'm pleased to introduce you for, to this session, uh,which is all about unlocking advanced search capabilitieswith Milvus 2. 4, our guest speaker James, which is the VPof Engineering at Zilliz. Um, James has extensive experienceas a database engineer at Oracle Head and Alibaba Cloud. Uh, James also played a crucial role in developinghatch base, Alibaba's cloud's, opensource database,and landform and self-developed NoSQL database. He's also a respected memberof the technical advisory committee of the Linux Foundation,uh, AI and Data Foundation, uh,where he is contributing his expertise to shaping the futureof AI and data technologies.

Welcome James. This stage is yours. Hi, Stephen. Yeah, hi everyone. So, uh, James here, uh,located at the San Francisco Bay Area, uh, working infor the, uh, last three years.

So, uh, currently the chief architect for the, uh,mul project, also a maintainer. So, yeah, glad to talk to you guys,and if you have any questions about mul, about mul 2. 4, uh,we, we can discuss in this session. Cool. Steven?Yes.

Do you wanna go and share your screen now, James?We can go over Sure. The different releases. Yeah. I actually have, uh, one, one short deck. Yeah.

But, uh, yeah, uh, we, we have some like, hi highlightsfor MES 2. 4. So, uh, Stephanie, you, you want me to do the introduce or?Yeah, if you want to, can, uh,either go directly over the highlights. Yeah. Yeah.

So, uh, we, we recently, we just re uh,released MI was 2. 4 release candidates. It's not a final version yet,but, uh, hopefully the, uh, GA version will be,uh, come out next week. So, uh, this is actually a long version comparedto UMM was 2. 3.

We spend around like six months, uh, working all this stuff. I think the, the, the, the major change is that we, we,we support multi, uh, multi vectorsand also have research, uh, on, in inside mealsbecause, uh, be before meals 2. 4, uh, each of the rolesor each of the entities can only support one vectorsand Euro is gonna be a, uh, once, uh, multimodalityand re application becomes so popular. We, we soon find that, uh, users are actually lookingfor like a better way to, to, uh, describe their data. It is not only be one vectors, probably, uh,like based on different inbound modelsor maybe just attributesor just, uh, different features for their this.

So we have these multi vectors. Uh, we support a sparsing manning rather than only dense,so you can have better search quality. Uh, we support a grouping search with, uh,which is a grid functionality for rack applications. We supported inward index on top of all the,uh, scatter datas. So, uh, when you do filterings,it's actually ac help you to accelerate.

We also supported, uh, fuzzy search, uh, fuzzy filterings. So, uh, you can use regular expressions to do all those. Yeah, and the most exciting feature in 2. 4 isdefinitely, uh, GPU Index. Uh, we work with Evid team, so they actually, uh, offers,uh, vector search algorithm on top of, uh,evid GPU is actually works pretty good 10 times faster.

Uh, if you, if you're running on a small batch, it,you can easily achieve like 50 times, uh, uh, QPS, uh,support increase if you have like, uh, large amountof batch if we're doing, trying to doing offline works. So, uh, we'll go through all, all the steps. So those are just some functionalities help you to like, um,better doing ai, uh, JI applications. Yeah. So the overall goal for 2.

4 is, um, is more likefocus on JI application use cases. So we're trying to help users who want to do racks, who wantto do all kind of modeling modality search. Uh, we use more semanticswith more search call and also performance. Yeah. Right.

So we, uh, we'll go into somemaybe, uh, details for, for, for,for, for each of the features. Uh, I think, I think the first one will be, uh,to be mentioned is that actually, uh,multi vector search and hybrid search. So, uh, as I just said rightbefore that, each of the vector, eachof the internet only have one vectors,but under some use cases, right?You have your description for, uh, one of your photos,you have your, of the pictures,and maybe it's, it's from different angles, right?See what happens here. When you have a a Tesla, you also have a descriptionand, uh, when search happens, it could be either like searchby text, it could be text to text,or it could be text to image. Yeah.

Before that I have to create multiple collectionsand, uh, you have to develop your own ranking models. So now what we can do is you can,you can store like multiple embeddings in, in, in one role. So when search happens, it, it will, uh, retrieve from both,uh, both, uh, few. And then they will do a re ranking algorithm help youto find the, like, top, top, top key candidates. Yeah.

With this feature, like it is, is pretty goodfor modeling modality use cases pretty good. If, for example, if you're doing red,you can put high in one embedding maybe, uh,tracking other demanding. So help you to improve the search quality. And the most exciting part to, uh, work with modeling, uh,vectors is that you can have different inventing models. You can use, uh, open AI together with, uh, cohere.

So, uh, both of 'em can find some different materialsand you can, you can, uh, uh, retrieve like, uh,get more interesting candidates. And you can also use Disney venning plus Sparse venning. Yeah. And then that, uh, what comes out is,um, sparse venning. Yeah.

So, uh,before that, most of the embeds, uh,we support is just, uh, dense. We also have battery venning,which is very popular recently. But, uh, most of users just using, uh, dense mans,when we talk about dense, it ly means that, um,each, uh, each of the dimension has some value. Um, it's, it's, it's actually a flow number. Uh, there are no zeros insideand dense mans ly come from, um, uh, machinery models.

Uh, it's, uh, uh, it's, it's more for semantics,uh, because when we generate, um, dancing balance,we also have those contexts. So this is not only about this word,but also about a sentenceor about, um, uh, a paragraph, right?But for, um, under many of the use cases, uh,we still want keyword searchor still want to focus on the details. Um, that's why you a, we people do hybrid searchbetween vector d, dbs plus elastic search, plus solar,plus all kinds of other search search engines, right?But now we have sparse meetings. So what sparse meaning does is that, uh, first of all, it,it is sparse, which means most of the dimensionsthat just gonna be zero. Yeah.

But why isbecause, um, for sparse meetings,we have actually have a dictionary. So each of the token is actually one dimension. So if you guys are familiar with, uh, what birds have, uh,each, each of the, uh, bird is usuallyaround 30 K tokens. So, which means our sparsing mannings is alsoaround 30 K dimensions. It has alar large,much larger dimension compared to dancing Mannings.

Usually density medicine is gonna be 7, 6, 8,or uh, 15 K 36, right?But for Spartan embeds, you have, uh, larger dimensions,but luckily most of dimensions gonna be zero,which means this corporateor this query has nothing to do with that token, right?So by using one week for each of the token, uh,sparsing letting focus more on more on details,or more on keywords, more on token search, right?So they are, they are trying to find tokens very similar to,uh, like between, uh, query, uh, query embedding,uh, embedding send the, the, uh,corporate embeddings, right?So this is extremely helpful. Uh, if, if you wanna say, let's see, uh, I want to search,uh, the movie I I watched yesterday, right?I underst some of the use cases for Dancing Mannings. You got all the movies not only for yesterday,but you got movies for the day before yesterdayor maybe last year, because, uh, it is still very closeor very similar in the semantics, right?Uh, the, the movie I watched yesterdayand the movie I watched the day before yesterday, right?But with Sparts Mans, uh, yesterday becomes a keyword. So it actually narrow down what you want to search. Yeah, that is extremely helpful.

Under many of the, uh, react user cases where you wantto get a controlled answer, right?You, you want to focus on those keywords,which are actually delivers key informations, right?And the other good part for S Spark meetings is you actuallycan work together with dancing embeds. So spar meetings focus on all the details,then embeddings focus on, uh, help you to get context. And when you search from both embeddings, you,you choose different kind of informationand you do rankings to, to get the most,um, relevant informations. Yeah. Okay.

So, uh, third one is actually grouping searchis also one interesting use, uh, use cases, uh, for, uh,when, when you do direct, uh,most likely you're going to chunking everything. You put, uh, a chunk of the, the documentand you're also embedding the chunks. Uh, but the thing is, when you do top, top 10 queries there,there is likelythat you get all the chunks from one single documentor one single book, which means you loseyour diversity, right?You definitely can do more if you have this diversityand you can do re-ranking, uh, you can show itto your, uh, users. So they, they can pick which, which of the documentor which of the book is actually more, more relevant. But what happens, what happens could be like,most relevant thing just from one book or one document.

Yeah. So now we have a new functionality called Group by. So if in your data model you have one field,it could be document id, it could be book id,and when search happens, you can group by this book id. So we are not returning the top 10 most relevant chunk,but rather than return most relevant documentor most relevant, uh, books. So we, we do that is, uh, uh, where it,it is more like a traditional databases.

So we actually iterate through all the most relevant chunks,and we try to groupby everything like a tra like a relational databases. Yeah. So, uh, the, the output uly, uh,has a group, group id, which,which is your document ID in your case,and also like the most relevant chunks in,in, in this group. Yeah. James, I'm just gonna stop you here.

There's a question quickly from the chat, uh, which is,can we somehow restrict the scope of qua while querying,like per workspace clients or group?Yeah. Uh, pardon me. Can can, can you, uh,I'm gonna repeat the question. Uh, can we re restrict the scope while querying something?Like, can we restrict it by workspaceor per clients or per group?Uh, I, I, I'm not sure if I get the question,but I thought, uh, what,what you are actually asking about is, uh, multi-tenants. Like, like how if we have multi, multiple tenants in onecollection, uh, what, uh, what will happen, uh, to, uh,limit my query to only part of my data?I, I, is that correct?That's What I assume.

Yeah. So, uh, yeah, I, uh, if, if I'm correct, uh,just gimme more information. But, uh, yeah, so, so what I'm saying is that, uh, you haveto use, uh, another feature we offered in new two three,which is, um, production key. So with this feature,because I know a lot of rack applications,they're like doing multi tenant stuff. We have multiple users,and they don't, they definitely don't wantto see each other's data.

Yeah. So the easiest way is where you create a connection. You have one partition key field. So, uh, when search happens, you specify which,which kind of the partition key. It could be a knowledge base, it could be a user id, uh,it could be a agent id, whatever, like, uh, it, it hasto be like, uh, you really gonna have more tenants.

It's, it's gonna be thousand tenantsor maybe, um, even, even a hundred K tenants. So each of the tenants will be, um,stay in their own protection key. And when search happens, if you specify the protection key,all result will just related. Yeah. Cool.

Thank you. And there's another one which is relatedto group grouping research. Uh, is grouping search supported with multi vectorY? Yeah, it's, uh, it's,uh, yeah, it's actually, uh, report,yeah, it's actually supported. So, uh, we actually, uh, start from, uh, supporting,uh, density mannings. We also support spars meetings and also, uh, brand embeds.

So, uh, when it becomes ga, I thought, uh, is,is gonna be support all kinds of embeddings. Cool. Thank you. Alright, so,uh, let's, let's keep moving. Yeah.

If you have any, uh, other questions, I just, uh,type, uh, tapping, uh, uh, yeah, I,I'll go through other the questions later. Yeah. All right. So that is, uh, what, uh, what, what, how we goingto use this, uh, uh, grouping search. So, yeah, uh, you have one correction with, uh, data with,uh, uh, in one in field.

You also have this, uh, uh, group by field,which is usually a document id when the output comes,you asking for document IDsand also passenger id,which is kind like the chunk IDs, right?So the result looks at, um, looks at like, uh, below. So it's, uh, euro a, uh, one, uh,distance pass one I, uh, one document IDand also one passenger id. So you, you know, which document is actually most relevantand which of which of the chunkinside this document is actually relevant. Okay. So, uh,another interesting use case is, is actually fuzzy searchfor, uh, fuzzy matchings and, uh, inward index.

So, uh, sex to the, uh, AV community,we now have a very high performance, uh, inward index,very similar to those, but it is a retaining rust. Yeah. So finally, uh, s is also a part of the,uh, rust community. So, um, with, uh, the, the word index offers very,very good featuring performance, uh,as a graph shows on the, on the right side. So, uh, if you are trying to, uh,filtering a large amount of, uh, uh, strings, large amountof, uh, integers wording, that's definitely help.

The, the only edge case is thatwhen your virtual condition matches like maybe 50%or more of your result, then it doesn't really make senseto, uh, using Word index,you should just directly filtering on all the datasbecause, uh, in Word index, you really work skewedfor low cardiologist, right?So, uh, another, uh, another thing brings to me with isthat, uh, since AVI has funny search supporting,uh, now we also support not only prefix filterings,but also, um, postfixand, um, uh, in Filterings you can, you can use, uh,very simple rapid restrictions, uh, to start with, uh,some kind of web card, uh,and end to end with the other like, well card, right?So emerging index helped to accelerate performance on the,um, file search. It also, uh, pretty important for many of the use cases,especially when you have, uh, chunks, uh, into the, uh,vector dbs. So you pretty much know what is the original contents,and, uh, you're just looking for certain keywords,uh, on, on that time. Like, uh, the further searchand the wording index help you to improve the performance. It, it is, it is not going to be super fast compared to,uh, prefix, right?If, if, if you have the choice, definitely do prefix,definitely do exact match.

But, uh, if the data is already there best you wantto search, then yeah, uh,at least you have one choice, okay. Uh, GPU index. So, uh, I know lot of, lotof the users are just asking about, um, uh, what, uh,is, is that gonna be beneficial?Uh, if records are using GPUs, uh,before 2. 4, the answer is you really, no. Uh, GPU is actually much faster,but yes, also has a lot of limitations.

The cost efficiency is actually not good. GPU is just too expensive. And also, uh, if you have very smaller batch,uh, GP index usually takes a lot of time to copy databetween, uh, main memory and GPU memory. So it's, it, it is not a perfect solution. Uh, at least not before 2.

Yeah. So, um, recent six months, we've be working very closely to,uh, the vid team. They actually have this, uh, uh, fancy GPU index called cwa. Uh, you can also get more information on the, uh,yeah, so with this C index integrate into, what we find isthat if you're trying to, first,it's actually a hundred, a hundred times faster.

If you're trying to use graph index,it's really 10 times faster compared to HW if,if you are using a VF indexes, you 30 certain times faster. And we, uh, we test on both AR a hundredand also, uh, L 40, which is, uh, latest inference card. Uh, we, we find that the L 40, uh, T 4 8 10,those are like, uh, cars, uh, veryfor vector search on GPUs. Yeah. Uh, and,and under those scenarios, if you have large batch,if you have very high GPS,you wanna keep your search latency stable.

CPU index might be your choice. You could definitely test on that. Okay. So, uh, here's some, um, numbers. So, uh, the fir the, the, the, uh, the first one is, uh,on the top is actually, uh, CPU agent is, it is kindof like the sort for, uh, CPU index.

Yeah. If you have, if you have eight 10,or if you have, uh, you,and you are doing a flat,which means it's a hundred percent accurate on, on GPU,you can still get very similar performance compared to,um, CPU Graph index. And if, if you are using a GPU to do, um, some,uh, a in search, right?Uh, look at the last line, which is, uh, the number we got,uh, a 100 plus category is, is, is like 10, 10 times fastercompared to a CPU engine. W yeah. So, uh, that's a trick.

If you have looking for high recall,definitely use GPU for fighting index. If you are using, if you're looking forvery high performance, stable latency, yeah. Uh, definitely peak, uh, cover index. Yeah. But, but, uh, there is also like one major drop back isthat when you use DP index, you haveto ping everything in your TP memory.

As we may know, that DP memory is actually very veso it's kind of based on your use cases. I know a lot of guys who is, uh, tryingto build their recommendation systems. DP index will be perfect for them,but, uh, if you're just doing them racks, maybe, uh, notas good because, um, I think for most rack applications,cost efficiency actually gonna be more important comparedto like performance. Okay. Yeah.

Then it comes to be cost efficiency. Yeah. We have this, um, m map solution. So before that, all your datas of your vectorsdirectly loading into memories,even if you using this index, uh, original datas in, uh,on this, but also we have a smaller index in ma in memory,it takes a, uh, lot of, uh, storage resource. But under some use cases,if you don't really care about performance, just want to,uh, lower your cost, then one of the other choice isthat you just, I mapping everything into onto local disk,it's still gonna be searchable.

Uh, it is actually swiping in, swiping out between the, uh,operating system page cache and also the disk. It is actually gonna be slower comparedto HW on the other index. But the thing is, you get, uh, four, four times, uh,of your vector stored in your vector DBwith almost no performance decrease. Like, like 10%, 20%. Yeah.

But, but, but you remember, if you, if you only usethat map, you have to use, uh,very high performance disk euro is going to be, uh, e SSDs. Uh, you cannot run it on, um, any of your,not even, I'm not sure how EBS is good at it,but, uh, we would recommend to use, uh,if you're running on AWSor any other cloud, we definitely want to use IO twoor maybe local SSDs. It is gonna be much cheaper compared to memory. Yeah. But, uh, and we, with, with the hyper, you get similar, uh,performance a little, a little bit lower, like latency,but, uh, it is actually much cheaper.

Okay. Yeah. Another exciting feature is about, uh, change data capture. So, uh, when data is written into mils, a lotof users are asking for how, how can I get the data out?Maybe you want to, uh, keep the data consistencybetween two mils clusters, one from usc, one from Europe,uh, that is one used cases only. Some other use cases you user want to do incremental backup.

So we have the CDC. So CDC is, is actually a subscriber from yours log. Uh, just like if, if you guys are very familiar with, uh,uh, my logor, uh, Postgres logs, uh, every,every databases have their own Red hat logging. So, uh, CDC export these, um, red hack logging,it could be delivered those logs into another mills clusterinto a, it can consume all those logs. Uh, and CDC can work together with the backup.

Uh, so backup has a full, full data, uh,and, um, uh, CDC covers the streaming dataor the incremental parts, right?So if you merge all the datas together, is, is, is,is gonna be your, all of your data set for both HRand for both streaming, right?So, uh, this is gonna be, uh, widely used on those cloud. Also, we use this for doing customer replications. We use this for, uh, incremental backups. So we offer these, uh, grid tools. Uh, you might need to have some special, uh,co uh, co uh,I implementation on your own use cases on cloud.

We have all, all those for free. So, uh, they e way to use CD C's just to use those code,but still, it's, it's actually open source to underAP Apache lessons. So if you gonna do any, anyof you tailor your own applications, CDC is agree to, yeah,it support all kinds of different operations. They have great job collections, have in insertions, tions,uh, all kinds of the meta operations. So you can just subscribe all those informations.

It's, uh, and maybe, uh, and,and integrate with other systems. Okay. All right. Uh, I'll just stop here. So those, I, I think pretty much for, uh, 2.

4 features, I,I have another page for 3. 0 roadmaps, but we can stop hereand see if you have any other questions. Thank you, James. Uh, yeah, we have coupleof more questions, uh, in the chat, um, which is one,which is again, related to the ion keys. So someone is asking if we can specifymultiple partition keys in one query.

Yeah. Each collection only support one partion key. That is the current limit limitations. But you can also use field trains. I think the best practice is that, uh, if you're tryingto do multitenant, uh, u uh, use cases, you use one, uh,partion case, which is usually you, you almost goingto query, uh, everything based on this protection case.

Um, a a best practice is, uh, uh, tenant id, uh,knowledge base, those kind of informations. And under that you have any other, like, uh, expressionsor conditions, then you can just use, uh, uh, filterings. You, you, you can have like multiple different, uh, field. Some of the field may be, uh, let's see if, if you're tryingto, uh, book, uh, build a book rec systems, you may have,um, author field, right?He may have also like, uh, um, like, uh, pages, right?Uh, to as, as meta meta information, right?So when, when the search happens, you can just, uh,specify expression page equal to one author is, uh,JK on those kind of stuff. Yeah.

Cool. Thank you. And going back to the fuzzy matchingand invented that index slide, um,can you go over the brute, what brute force means?Uh, we had, you know, there's a graph of, um,about embedded index performance,and it's written brute force here. Can you go over it please?Yeah. So, so, so, uh, we want to filtering on some, um,uh, some of the conditions.

So all this we to do is, uh,bru first, uh, proof first search. So what we do is we just, uh, get, uh, each of the integersthat, let's see if integers field,so we just get one the each doing the comparison of based onwhat, what kind of expressions you have. Maybe it's just, uh, a equals to one. So we, we got, um, low a, uh, low one, uh, of this, uh,field, uh, with, uh, field a, uh, compare it to, uh,if it's equal to one, if it,it is then the future pass, right?Other than that, we go to the second goal. So we finish all the goals that is, um,what fu uh, are doing.

Uh, yeah, ideally is, is is not that easybecause we highly, uh, depends on sending instructions. So most likely what we do is we will compare eight, uh,different phone numbers in the same time,so it actually improve the performance. Yeah. But what you have in really in that sense of, uh,like very different, because we, we, we change the orderof, um, every data. So we put all, all the data with, uh, equal to one together.

So when you ask for equals to one, we don't need really needto compute it through everything. We just need to get all the datas with equal to one and,and put it into a bit set. So it's actually much fasterbecause we, we changed the data, data layout. Oh,Thank you. Uh, thank you.

Thank you. We still have more questions?Uh, oh, yeah. One, uh, whichof these features are available in Zills Cloud as of now?And if they're not, when would they be?That is a good question. Uh, so, um, what, what we do right now is that, uh,there is calls Euro gonna be twoor three months later compared to open source. The, the reason is, uh, for open source, I,I know there are, like, there are a lot of, uh, userswho want to use new functionalitiesbecause they're still under their POC stage.

They don't like,although like stability is gonna be very important. But, um, even many of the users are using it into, uh,their task environment, even it's not, uh, like, uh,stable enough. It is still a release candidate. Yeah. So, so we try to release everything as, as soon as possible.

But for zes cloud, uh, we want to keep things, uh,stable enterprise ready. Yeah. We wanna give you the, uh, best practice, not, um, notto be, uh, something like not real, very comfortable with. So usually takes another two or three monthsbefore we put everything to the this. Yeah.

We, we, we tried to ize thosefunctionalities early May. Yeah, you have your choice whether you want to upgradeor not, but, um, to put every cluster into, into, uh, 2. 4,maybe, uh, seal another three miles. Yes. Cool.

Thank you. Um, someone ask, uh,is there partion size limits?Oh, okay. So, uh, or something different. I saw this is another, uh, multi tenant use cases. So to do multi tenant,you actually have two or three different ways.

One is using cloud multiple collections,one is using predictions. Those are actually gonna be very similar. So my suggestion would beif your tenants number is definitely less than a thousand,maybe a little bit more than 10, um, a thousand,then use collections to do the accelerations. That is really under some use cases for, um, SaaS,for SaaS companies. When they want to build, uh, some product to, to businessor to enterprise users, you won't havea hundred K enterprise users, right?Then you, you, you, you,you're definitely gonna be a big su success,but on most use cases, uh,you just have a thousand enterprise users.

You can just isolate each of the users by their, uh, uh,buy collections, right?So then, but still have a limitation, uh,I don't think mill can do more than 10 K. Uh, I, I I will say keep the, uh,collection numbers smaller than the 5K. We actually did a, a lot of enhancement on that for 2. 4,so it should be much better for be, uh, compared to 2. 3,but there's still lot limitations here, right?But if you, you try to build something that forconsumer applications, then you have like large numberof users, it is, it is gonna be a million users per day,then you definitely cannot use collections to isolateor permissions to isolate all the datas.

Uh, then that's why we have permission key help you toas isolate smaller tenants in one large collection. Thank you. Yeah. Wait, we have more questions coming. So there's one, um, that is,can the group such handle offsetand limit parameters with respectto its outputs for preservation?Yeah.

So, uh, they still have offset than limits. So, but, but the semantic is a little bit different. So, uh, when you do, when you use offset, uh,you actually offset are, uh, documents, uh,rather than offset chunks. So if, if you offset, uh, uh, 10, uh,if say equal to 10, which means I wantto find all the documentsor the, uh, group, uh, group with, with inside the like 10to 20 groups, not, not like 10 to 20 uh, documents, uh,not 10 to 20 chunks. That is the major difference.

But yes, we support. Thank you. Um, there's one which is relatedto the GPU release, which is, is there a wayto configure index, so that indexingof new segments will be on GPUs and,but querying of already index segment would be on CPUs?Is that possible?That is a good question. So the answer is we are actually working on that so far. No.

Uh, because, um, uh, for Carre index,it is actually a special graph, so it has to be searched. Um, we, we try to search on CPUs,but the performance is really badbecause, uh,GPUs doesn't really care about computation powers. So what we do, what they do is try to use more computationsso we can get higher qualities. Yeah. But, uh, if, if you use the CPUs,then computation powers becomes the bottleneck.

So you, you, you cannot really use thesame graph architecture. Yeah. So, but what we do is we try to adapt CPU indexinto, uh, into GPU, like GPU to build it. Uh, we have some progress on it, but it's not released yet. Yeah.

Uh, let's wait, maybe 3. 0. We get all, all, all those stuff. Thank you. Um, another question which is relatedto scoring metrics that are supported for hybrid search,are we, are we supporting them all, like IP IGN L two,or are those limited?Yeah, uh, that's a good question.

So we, we actually support them all. Um, one thing either way to do re-ranking is using rf,so it's just based on rankings. There's nothing to do with, uh, the distance, uh, how,how large the distance is. Uh, we also have support another mode called the weightedscore under that. That case is, uh, what we do is, uh,we do a normalization of the metrics.

Usually really cosign metrics, um, is between zeroto one when you are using IP metricsor when you are using the other kind of metrics,um, like high metrics. So for bannering or uh, L two metrics, thenwe actually do a normalization to make sure that, uh,the distance between zero, uh, minus one and one,and then you can align weighted, uh, between, uh,different metrics. Yeah. So that is really what gonna be work. But, uh, yeah, I, I would recommend if if they can be in thesame metrics, that'll be much better.

Thank you. And also, someone asks, so we have,uh, VIS has a backup, uh, has an option for backup. Sorry. Uh, do we have plans to integratewith storage vendors like NetAppto leverage like their desktop snapshot technology,or I think,Um, Not for now. Yeah.

So, uh, any the, like, uh, storage vendors, uh,we'll be, we'd be very open. Like if, if they can offer some, uh, some help. Yeah. So we can, we can, we can integrate all the storage, but,but for now, mul is more focused on the, uh, cloud. So we, what, what we do is we, we Euro would just be,be more focused on object storage,because it is, it is works like the, uh, uh,common protocol on the cloud.

We actually, we actually supported, uh, uh, EFS,which is the file systems. Um, um, the, uh, AWS also, uh, itself is,is gonna be very little work if you want to use any kindof the file systems, uh, nas, uh, a FS Yeah. Offer offered by the other vendors, uh, as, as longas it's just for backups, if, if you're tryingto use it search to do search, mostof the remote file systems not like performing enough. So, yeah. But backups should be very easy.

Cool. Thank you. And someone asked, um, in the exampleof the group search you have on the slide, uh,why is there only six grouping resultswhen the limit is set to 10? UhOh, it's just because I don't have enough room. Yeah. So ideally you just have 10 without Yes,but, uh, I, I just have enough room on my side,so just cut off.

Yeah. Thank you. Uh,and I think that's, that's what I have at least for now. Uh, does VIS support scoring data on oneor more multi-dimensional data cubes at the same time?Uh, I'm not sure. What is the, that'sData cube.

That's what I was thinking. It's like, um,dimensional area value. Sorry. Uh, I'm, I'm not sure if I get the questions correct. So, uh, if trying to do something as a geospatial search,uh, yes.

It's actually in our plan, it's, it is, it's, it's not yet,uh, for very small dimensional datas,maybe two dimensions, three dimensions. Now, we actually work with, um, uh, some, uh,some other engineering team in even big company. So, uh, we, uh, they actually have a lotof geo location applications. So, uh, what we wantto add in 3. 0 is we wanna have a geo locationthat have also daytime.

So, uh, when, when search happens, you can also like limitto a location that would be good enough that be, uh,very good for if you're trying to build some locationof your applications, like, uh, delivery systems, like, uh,uh, something like Uber, something like, uh, uh, yeah,DoorDash, those kind of stuff. Yeah. Cool. Thank you. I don't have any more questions so far.

I'm just gonna wait a quick minute. Um,Yeah, and, um, maybe, maybe, maybe I can just, uh, quicklydo what we have for the, uh, 3. 0. Go for it. Yeah.

Right. Thank you. Yeah. Okay. Oh, one moment.

One. All right. So, um,here is something on our roadmap for 3. 0. We, we is is not a final decision yet.

So if you guys, any, any other like, uh,suggestions you have, any feature request we, well, uh,that we want to, uh, talk to you guys, uh, send me emailsor go to the GitHub. Uh, we, we, uh, we definitely, uh, still waitingfor more suggestions, right?I think 3. 0 is actually, uh, the biggest release this year. It's gonna be happen, um, maybe, uh, July,late July or so. There are a couple of things we want to do, uh, not only forre application users, but also, uh,for like more interesting use cases.

One is definitely performance. Uh, you may know that the mills is very good at skill,but we still want to, uh, lower your cost. Under many of use cases we see, like user have, uh, uh,timber vectors, it's actually cost them, uh,very large co cost are like, uh,so you see gonna be very expensive. But, uh, under some of the use cases,a search really happens. Uh, that's why we have this, uh, little loadthat also autoscaler.

So when little load happens, you don't really need to like,load everything into your main memory or your local disk. Uh, every data store in object storage with search happens,we, uh, load everythingand cache it into, into, uh, your local disk. So, uh, it, it can be lower your latency, uh, I mean,the latency will gonna be increased, um,lower your throughput, uh,but then you,you store definitely store more data on one node, uh,compared to previous versions, and you can use autoscales. So we don't have, if you don't have too much request,then help you to, uh, uh, like scale down. Yeah.

Uh, performance. We, we are still working on, uh, GPO accelerations. Uh, we, uh, we really wantto add more the current functionalities,including grouping search. So, so far, GPO don't have those support, uh, for very,uh, advanced features. So we really want to have all, uh,all the features support RTPO, uh, index.

And, uh, technically is, um, doing more tions. We we're going to support banking tionsbecause it's, it is very popular. Uh, and many of the inventing vendors alreadysupported banking tions. Yeah, we'll have a renew storage, uh, S3. So newer storage on S3actually accelerate the point queries.

It actually helps a lot when you want to, uh,specify Apple fields, you want to get the, uh, data out, uh,out of factory b rather than only scores, right?So, um, uh, the, uh, new storage is actually, uh,optimized for point queries,and it's also optimized for object storage. So it's, uh, should be, uh, greatly,uh, improved the performance. If, um, if, if you, uh, doing rack applications, you wantto chunks out, you want sum the main information out. Yeah. Uh, we also worked a lotof things on the, uh, ease of use.

We have, um, mules local mode. It's actually new, new mode for data scientists, uh, is, uh,um, for mules. I think the deployment is actually too complicated for a lotof users, is especially when they want to, uh, first users,they just want to cast. Uh, now we, what we can do is we can just do a piping saltto install our mules, uh, in just maybe one minutes. Uh, it has, uh, exactly the same APIs,but, uh, not, not support full functionalities,but it's pretty good for day one users.

Yeah. Uh, we also try to integrate the OC mode togetherwith long chain together with LAMA index. So, no, it's, it's good enough for you to do a lotof experiment under like 1 million dataset. Yeah, we have new SDKs. Rust is, uh, actually our top top point priority.

Uh, we saw a lot of ai GI users, they're trying to use rust. And, uh, yes, we, we are goingto al support Rust SD case stay sharp, come from Microsoft. Uh, li team did a great jobto follow up all the new functionalities. Yeah. Um,we'll support more database operations like schema change.

Sometimes when you're trying to build applications, uh,you at the, at the day, I don't know what, what kindof information it's gonna have. It definitely, you can use dynamic schema,but the future performance is actually lower. So you can use schema change at one columnor delete one column. Yeah. Uh, welcome to support more that have some more index.

Uh, we, one, one big thing we wanted to do is, um, index onJS and, um, array. Uh, we we're also going to, uh, thinkingof supporting daytime, uh, that typeand also geolocation dead types. Yeah. To support geospatial search. And also, uh, a lot of use cases for agent.

The, the u you want to filterand based on time, then daytime is actually, uh, the,that type, what they're looking for. Yeah. We're gonna support, um, primary key deduplication,because now we keep telling all the users,you should design your primary key, make sure it's unique,but still under some use cases, what happens is that,for example, your Kafka crash, you have to waitand retry while maybe retry happens during there,actually there two with same primarycare that happens, right?So we'll, we'll help you to protect. So, uh, that will be happen on 3. 0.

We have, uh, the, uh, uh, actually, uh,another Kiwi storage help youto maintain all the primary case. So when this primary case already in, we, we directly, uh,throw exception out,or we just, uh, uh, use, if you follow in, uh, in ind uh,ind depth designs, we just ignore those, uh, uh,duplicated primary case. Yeah. Uh,and the other most exciting part for me isthat 3. 0 is actually going to add more model steps.

So before 3. 0 will be very focused on build this infra. Uh, yeah, we actually did a good job on that,but, uh, a lot of users are just asking where I'm not a likeexpert on in embedding re-ranking everything. It's just to complicate for me,is there a way I can do a data in that out rather than Igenerate all the imbalance then the send to the vector db?Yeah. So 3.

0, we, we are gonna work with a lotof inference, um, vendors like re likeBento ml, uh, like a lot of model providers like open ai. So what we do is we actually have a function. So, um, you can just give us, uh, chunksand, uh, we'll just calling the, uh, a functionand, uh, they function as we integratedwith all the other embedding providersand ranking re ranking providers. Yeah. So then mul have all the original raw data,uh, embedding happens on their side, so we just help you to,to, to call in their, um, uh, like, uh, reference, uh,like inference engine, right?So we also support more semantic search use cases.

Uh, now I think most users just using,but we actually want to add more different semantics. We want to add, uh, filtering, uh,nearest neighbor filterings. So the filtering not only happens on, uh, scatter datas,it also happens on vectors. Let's see, if I have a bunch of dog photosand, uh, also some cat photos, the search might be, I wantto find all the similar dogs, but definitely not cats. So you gave one positive example, uh, as a dog,you can also give them one negative example as a cat.

So we just, uh, try to get most calls, read out to dogs,but try to filter out, uh, cats multi target is, is,is kind of like a similar use cases. So, uh, when you start search, you probably don't know what,what kind of sales you actually search with. So give one positive example,we give you like a hundred similar result,and from those result, some of them are might bewhat you really want, the others might not. So you can mark, mark the things you want as the positivesand, um, things you don't want as negatives. And you can do another search.

Then it's called multi target searchbecause, uh, it's not only one query vector. So you have multiple positive examplesand multiple negative examples. Yeah. Murals try to search based on all the examples, tryto get close closer to, um, the, uh,the mo the mo the, the positive, uh, ialand, uh, get far away from the negative ial. That could be extremely helpful if you tryto discover your data.

So you have like these, uh, exact targets. Yeah, it's pretty much for 3. 0And that's for this summer, right?The hope is to release it in around summer. Right. And James, the hope is to releaseThree points summer.

Yeah,that's for this summer. Cool. Yeah. That's where, uh, we are targeting on July. Uh, sometimes we actually delaybecause yeah, we, we just tryingto add too much things into, into one release,but we are trying to deliver in time.

Yeah. Okay. Okay, cool. Thank you. We have one last question,and then I think that should be it for the webinar.

Um, does vis support changing fields,uh, and not vector fields?So like if you have a value that that is already inserted,um, one possibility is to do upsets,but then the person is also asking, willthat trigger a re-indexing of the vector field?Yeah, that's a, that, that's actually a good one. Um, I'm thinking of this, uh, this is, uh,also heavily depends on the storage formatbecause to, to me too, everything, your storage format,because we umm, is a little different from, um,other, uh, competitors. We are like, uh, uh, disagree with storage and competition. Uh, so you definitely get the benefit is the systems becomevery, uh, elastic and also it, you, uh, easy to scale. Uh, the problem is, uh, you cannot doing any modificationson, on top of, um, object storage, right?So, so we have to carefully design the storage format.

Uh, so, um, before,before we can support, um, observed or,or change the, uh, fuse. Yeah. But, uh, for sure, yeah, we, uh,we are thinking about this probably not in 3. 0,maybe in 3. 1 once we have the new storage format,that makes us easy, easier to, to do that.

Yeah. Very cool. Thank you very much. I think that's it, that's it for all the questions. You, that's all the question.

So thank you very much Jamesfor this one casing everything, everything's released. Uh, thank you everyone for coming. Um, have a good morning, afternoon,or evening depending on where you are in the world. Prefer to join on Discordas well if you have some questions. And I'll see you during our next one.

Thank you. Bye bye.

Meet the Speaker

Join the session for live Q&A with the speaker

James Luan
VP of Engineering at Zilliz
James Luan is the VP of Engineering at Zilliz. With a master's degree in computer engineering from Cornell University, he has extensive experience as a Database Engineer at Oracle, Hedvig, and Alibaba Cloud. James played a crucial role in developing HBase, Alibaba Cloud's open-source database, and Lindorm, a self-developed NoSQL database. He is also a respected member of the Technical Advisory Committee of LF AI & Data Foundation, contributing his expertise to shaping the future of AI and data technologies.

Unlocking Advanced Search Capabilities with Milvus 2.4: Accelerated GPU Search, Multi-Vector Search, and Beyond

About the Session

Key Highlights:

Meet the Speaker

AI Assistant