Events
Dense Embeddings != Complete Search - A Sneak Peek of Milvus 2.5

Webinar

Dense Embeddings != Complete Search - A Sneak Peek of Milvus 2.5

Zilliz Webinar | Zoom

Join the Webinar

What will you learn?

Dense embeddings miss exact matches. Keyword search misses semantic meaning. Running two separate systems is a maintenance nightmare. We'll show how Milvus 2.5's hybrid search tackles this with a unified solution, preview its sparse-based BM25 implementation, and share performance numbers against current Elasticsearch-based architectures.

Topics covered:

Where dense embeddings fall short and how a unified system architectures address the search needs
Sneak Peak of Milvus 2.5 - Quick look at our BM25 implementation and sparse vector optimizations
Benchmark results comparing hybrid search latency and throughput vs ElasticSsearch
What's Next - Brief overview of upcoming features in our technical roadmap

View presentation slides

Transcript

Hello everybody. Welcome to our WOW webinar today. Um, today we're gonna be going over, uh, vis 2. 5. By the way, my name is Chris Ello and I work here at Zillowand I run, uh, marketing Endeavor.

And we're really happy to have you,and I'm super excited for you guysto see all the cool things that we've been working on with,uh, open source bu,and, um, we, uh, well, we actually have a coupleof people here that are on the, uh, panel today. And, uh, that's, that are gonna be availableto answer any of your questions. So, myself, um, Steffi, who is our product marketing lead,we also have Emily and, uh, and of course James. Um, and we'll, we might even grab a couple of engineersto kind of help out with any of the question and answers. But in the meantime, uh,I wanna also remind everybody we have a really cool Discordchannel where we all hang out.

So please join us at the Discord Channel. Another thing that we just started recently is we actuallyhave a set of office hours that you can sign up for. And so we've been doing them in the, in US hours on Fridays,and, uh, originally we had allowed peopleto schedule 15 minute, uh, meetings with various engineersto answer any of your vis questions. And, uh, we found out that 15 minutes was too short. So we actually bumped that up to, I think, 20 or 25 minutes.

And, um, the nice thing is, uh, we've already, um,heard some feedback about making some changesto our documentation, especially in regards to,uh, how to pick an index. So thank you so much if you were one of those peoplethat gave us that feedback. And, um, yeah, we just want you to be really successfulwith your, uh, VIS implementation. So come join us. Uh, the other thing is we have, um, uh, we,I you probably already know,we do these meetups on a monthly basis,and, um, they're called unstructured data.

So please join us. And if you can't,we have all these sessions recorded on our YouTube channel. And in January, we are goingto be doing our very first women's hackathon. So if you are a womanor, uh, you know,we're actually including everybody if you wanna join us,we're actually gonna be hostedby Stanford University in the end of January. And, um, we're hopingto do some pretty cool hacking together,and we hope that this is gonna be one of many hackathonsthat we do in the us.

We did a really successful one in October at Google,and, um, yeah, it was a lot of fun. So, uh, it's an all day event. Um, and, uh, we'll share our, um, our hackswith everybody, um, you know,through the various channels like, uh, through our blogsand our, our YouTube channeland our, uh, social media, uh, channels as well. Okay, cool. So let me just make sure we got everybody here.

All right. So what we're gonna do is, um,couple people here not feeling super great. Uh, so we actually did, uh, this presentation, um,just a couple days ago. And so I'm gonna actually, um, present to you the, uh,recording that we did of James. Uh, but he is here in, uh, to answer your questions,just not feeling super great.

Uh, and I'm, of course I'm here, uh, live to, uh,help any of you as well. So let's get into this. So James, of course, is our VP of engineering. And this guy, I don't think he ever sleeps,so it doesn't surprise me that he's not feeling so great. Plus he has a little baby.

So, you know, babies always give you lotsof fun colds in the wintertime. Alright. Oops, let me make sure I shared this correctly. Actually, one more time, gimme a second. Just realized that I probably didn't sharethe sound correctly.

All that prepping, okay, share, share. Sound cool. Got it. All right. So gimme a thumbs up if you hear, uh, the sound correctly.

A quick introduction about myself. So I'm Jay,That sound good?Cool. Awesome. Let me expand it. And like I said, uh, we're actually here to answer anyof your questions so we are ableto multitask a little bit better.

So here we go. James, uh, VP of engineering, as well as I also built, uh,ware from scratch, like back to three, four years ago. So we are kind of like the most scalable, uh,performance vector DB solution in the market as, as well. We are open source. So, uh, I, I, I,before join was actually worked from a couple of different,uh, databases, uh, from Oracle.

Then I joined Alibaba Cloudas a like database designer. Okay, James,Can you speak to the why?Okay, so get closer. Okay. Yeah. So today we're gonna talking about we, uh,cover three different parts.

One is why is many service is not all your need special whenyou build rack applications. Uh, second is how we going to do itor why it is better than like a lotof the existing solutions. Uh, last one, I wanna give a quick demo abouthow you can just write very simple codeto employment like rag, uh, with, uh, hub research. Okay, so, oops, I want to move. Oops,It Happens.

Okay, so, uh, quick, quick recap about like rag. I'm pretty sure like everybody here should think, uh,like our heard about rag, right?So, uh, it's actually simple. Uh, as simple as you can do is just put everythingchunking put into a vector db. Oh, that could be better. Okay?So, uh, put everything, uh, into a vector dband then like when, when you want to do queries,you actually, uh,embedding all your queries search into w db,and then you get some of the like, key factsand destroy it on your, like, larger models.

You can then you can get a reliable answer, right?Remove some of the hallucinations, right?Sounds to be really simple,and it actually works pretty gooduntil one day when I was trying to build a rapid applicationby myself, I worked with my, uh, friends. Uh, what we're gonna do is, uh, actually a legal,like applications, uh, we actually have some kindof very special legal terms in, in the document, right?So, uh, when I using this ification as, as wellas just using as, uh, like vector dbs,at first it looks to be really, really good. But if you take into, uh, a deep consideration, you'll seethat some of the terms actually, uh, not searchingas effective, uh,we actually saw the search becomes like overly broad. So if you search for one term, you,you may get some similar terms,but there's nothing, nothing can control this, right?So, uh, I'm trying to start thinking abouthow we can fix that issue. I actually using two solutions.

One is just fine tuning my embedding models, uh,and also learning models. It works okay at the very beginning,but soon I find with moreand more corner case when I start training my, uh,fine tuning my, uh, bad cases,I saw like larger models start to forgetting a lot, lotof like basic informations. The main reason we use our modelsor embedding models, just so like they have been trained,print trained with a lot of like,uh, existing information, right?So with a lot of fun tuning it start to forgettingand, uh, re the, the score,actually the evaluation score actuallygoes, goes down, right?So, uh, the other way I actually triedis doing query writing. So, uh, I'm trying to find all the like synonymous, uh,for certain kind of terms. I try to change the way speaking, use larger modelsto change, like doing a lot of tricks on that.

But sometimes there are like just too many counter casesand you can, you cannot fix this for your use case, right?So there seems to be nothing we can do, okay?Uh, if, if you look at what is actually wrongfor semantic search, right?It's definitely not all you need. It's, it's actually pretty good technology. We build record db. So I won't say like record DB is notgood, uh, but there are like lackingof 3D major things, right?One thing is, uh,there's no way you can control your result, right?You, uh, just like larger models, they're, uh,basically just a black box. So why you got this result, uh, why they become similar is,is all a black boxand there's no nothing we can do ifit things goes wrong, right?Uh, second is that it's failed to, uh, handle some uncouncommon queries, right?So this embedding models are trained by a large dataset,but still a, actually the,the dataset is coming from the webor those are very common stuffs, right?But, uh, under your use cases, you may find thatfor some special term, more special like queries,it works not pretty good.

Sometimes they don't even see the token likein, in the tri set. So, uh, it, it won't work, right?Certain thing is always about chunking. We all know that chunking is not easy. The easiest way you can do is just a chunking by 4K,eight K, but there are a lot of trade offs between right,long chunks, you lose all the contactsor you, you lose all the details. If you're using short chunks like five, 12,then you lose some contacts, right?So it's, it's always hard, hard partto like how you chunk your data,Okay?So how we, how we, uh, think about a way to fix that.

So, uh, then I started thinking about what, what,what makes search very powerfulor what, what is the like way we can do search?I think it's all about probability, right?So no matter is larger models,no matter is the in embedding models. So the only thing they want to do is like trying to find,uh, like some patterns in our,like documents in our corpus, right?As easy as as it can be, if you, if you like,calculate the frequency of English letters,you'll definitely see some letters get higher. For example, E is,is actually showed much, uh, more frequent. So if you wanna build a small, large modelto predict the next word in the, in the, in the document,uh, I will also PPPE if I don't have enough RIN does thatbecause that, that makes most sense, right?Uh, same here for, uh, two gram. So if I put two letters together, uh,you'd definitely see some of the patterns is actually notsaying there such as zx g, gq,but some of them are very pop popular such as th right?Okay.

So, uh, same thing happens here when,when we put into not only about letters,but if you put into tokens, that's how large metal works. So just to predict your next tokens, right?So with enough training dataset, then you know,what is the highest possibility?That could be the next word. And you know, which kinda like, uh, uh, ladder actually,or tokens are actually sim most similar, right?So the question is how,how do I improve the probability, right?One way to do that is always using print models. Uh, open AI is definitely pretty good. We also know a lot of couple of lagging value models.

Warrior cohere, you train from like massive amount of datas. If you have a, if you're tryingto build legal data set, they have law models. If you're view finance dataset,they have like models training for, uh,those specific dataset. And sometimes you can do some fine tuning,but I, I think those tricks are not, like,works pretty good on some, some of the use cases, right?So the other way you can do that isto figure out your own data distribution,and that sounds to be very, very difficult,but that's not that hard, right?Okay. Uh, so the way we dothat is actually very traditional,but it actually really helpful, uh, to start from,uh, we need to recap.

So those part will be a little bit technical, uh,but hopefully that, uh, I can, I can give you a like,general background about what's happening here. So, uh, uh, the very traditional way we can get this, uh,stats is something we call tf IDF. So TF means term frequency. So it shows how much time each token shows in yourcapus, right?So, uh, the other one is a DF, it shows that the frequencyof each token amount of your documents. So, uh, for example, if I have, uh, a lot of words in the,in the, in the same doc, in the same documentsor in my corpus, which means this token isnot gonna be very important.

It could be a third, it could be a, uh,or it could be like very common tokensand it doesn't contain us enough semantics, right?But if you saw some very special words we never thoughtbefore, it only shows once in your corpus,then which means their IDF is gonna be only oneand the eight weeks is gonna be super high, right?So if we get all your dataset, if you do all the stats,calculate all the term frequency, all the in, uh,inverse document frequency, then there are actually a wayto, uh, calculate the relevance between your queryand also your, uh, like corpus. Uh, that is, that is what call as the T-F-A-D-F. So elastic searchand all like lucin based, uh, uh, like search,actually using the same technology. Uh, but they, they definitely different versions ofhow you can calculate the stats once the most popularones called BM 25. So, uh, they're generally using the same idea,but they change some of the like, hyper parametersto make sure that search makes, uh, more sense, right?Okay.

So, uh, what could happen when you are using,uh, T-F-A-D-F?So it's, it's generally, uh, three different steps. Uh, the raw con, the raw text here is the millwith the vector that is built for scale. First part, you have to split into different tokens,just like do, uh, you have different token eithers, uh,sometimes the easiest way, just a to, uh, splitby like four letters, five letters,but you also can use a lot of different, like stop word, uh,using a comma to spread everything. Uh, for sure we have some integrated, uh, so for that. So you don't really need to worry about how to tokenize.

So after you tokenize, you do some transform, make surethat you do lowercase, you do stammer, uh, you remove someof the very common words. So now you get some clean, uh, tokens, right?With those tokens. Second step is, um, you actually haveto build in word index on top of it. So, uh, I'll show what is the in word index, uh,but it gives you some like performance. Uh, when you, when you tryto search on those right third parties, you need to stats.

That's how we figure out our, uh, data distribution. So the important part is that we, we need to calculate,for example, mul how many times mill shows in my document,how many times weer shows in my doc document, right?Ideal vector, vector should be like, shows more comparedto mules than the other, because vector is like more like acommon words, right?So, which means if mill shows it,it's actually gonna be more important than vectorsbecause it show, it shows that in all the documents. Okay? Okay. So, uh, I can give a quick example about what happens. So I have a three document.

One is the information, uh,retrieval information material is a field of study. Second information, retrie, focus on follow relevant, uh,informations, whatever started that. Data mining and information retrieval overlap in research. Okay? So those are three documents I have. If I search for information retrieval research,which one should be more important?Yeah, uh, I think ideally definitely number three.

First of all, because all the threetokens I have information retrieval research actually showsin document three, right?And, uh, what about document one, two, which,which one is gonna be more relevant?Yeah, the answer is, uh,document one is gonna be more relevant, uh,because, uh, document two,they both have information show as a keyword. So it's all there, uh, information actually shows much,much, uh, more, like more times. So it has, uh, uh, less, less weight retrieval shows. Uh, like less time is the only 10%of your document has retrieval. So it's become more important.

And, uh, after you do all the calculation, uh,and, uh, sometimes you also need to divide it by the learnsof your document, uh, to do some kind of normalizations. And you can also like always calculate the scores, right?So that's a, that's a normal way how we using BM 25 toevaluate, uh, how query and document are relevant. Okay? So it sounds to be really good, right?So, uh, we, we know we have a way to control it. We just need to know about the tokens. Everyone can read the tokens, we can modify on that.

Uh, we can do future on that. Is that good?Uh, the answer is not, it, it's not enough, right?So that's why we still need vector db. Uh, reason is, uh, first of all, uh, you lose,sometimes you lose your contactby just doing semantic search, uh,by just doing lexical search. You're just trying to find most similar words. But sometimes, for example, the Apple words,they have multiple different meanings.

Do I mean the fruit apple do, I mean the fruit company?Yeah. So, uh, that's gonna be one challenge vector, uh,vector DB or vector embeds definitely helpto understand the thing, like behind the thing, right?So the second thing is what about the sys?So sometimes even if they have similar ladder,similar words, they have like, uh, different meanings,but on the other side even have totally different words,they might have similar meanings. That's why vector dbor vector embedding still gonna be your, uh,very important for a use case. Third is about your, uh, intentions. Uh, just by looking at the word change cut air,if I have different contacts,if I'm looking at the e-commerce webs, uh, website,or if you're looking at the car manufacturer website,I actually searching for different things, right?So it's not, so just using like keywordsor just using Lexi based search won't helpa lot of these use cases.

So, uh, that's what we propose in mul 2. 2 0. 5, uh,which is a new hub research MO modelcombining both lexical search and semantic search. So, uh, it, first of all,it works pretty good if you look at the,if you look at the numbers here, so, but just doing denseor sparse, you can probably get 50%, 60% of your,uh, recall at most. But, uh, if you combine all thoseand you do a re-ranking definitely shows a, a lotof like in improve on your search qualities, right?So, uh, we actually using a model called BGM three,so it can generate multiple different embeddings.

And BM 25 is, is, is kind of the same. You can combine both BM 25and also, uh, a pre-train model together to to,to get a better result. Okay? So, um, what's the challenge here?So if, if, if it's already so good, why,why is not like putting into production, right?So a couple things, uh, is actually the blocker to put, uh,head research into production. Uh, one is it actually makes your, uh,architecture more complex. So this is a very common architecture.

I, uh, for Alexa search a lotof people using elastic search, using open search, uh,using solar, and, uh, they are not pretty good at, uh,like dancing, embedding search, uh,for performance for cost. So that's why you also need a, like a special, uh, vector,uh, vector db, uh, which is your emails. I know there are like some, some others,but, uh, generally what you do in build a system,you have a search service, it help you to do ranking,help you store all the draw data,but you, you, you also need a system for, uh, Lexi searchand system for sematic search. And you also, you are, are embedding models, right?So that make your system become very, very complicatedand it's no way to maintain it, right?Uh, second is with, uh, more multiple different embedding. So also like, see cost issues, right?So, uh, you'll see like, uh, you haveto double your cost sometimes it like even triple your costbecause you maintain multiple different embeddings, right?And, uh, also you have multiple different like bottlenecksright on, on your performance.

So it's very hard to diagnose the hard, very hardfor observability, okay?Uh, so that's why we startto think about like merging into one, uh,like one large cluster or,or one like database can host both like rec uh,vector search and, uh, s spark search, right?So, uh, this part is gonna be a,a little bit, uh, technical. So it gives a little bit background aboutwhat is the traditional, uh, Alexa search engineor what, what is a yes do. So generally speaking, uh,what they do is they using a inverted index. So for each of the token, they actually, uh, like listhow many of the documents actually containsthis token, right?For example, if I have, uh, uh, like a Harry Potter book,I split into different chunksand, uh, Harry is, it is gonna be my token. And for different chunks, like token one has Hari,token two maybe have hurry twice.

So that is how we, how I made this, um,in word index list, right?So when search happens,the e the easiest way we do is called DAAT, uh, shortfor document at a time. So, um, we just quickly go through all the relevant. For example, when I, uh, search for B, c, A in my use case,B, C, A are just, uh, different tokens. So I start from BI see, okay, D one is actually related. So I go to c, d one is related as well.

So I add up all the score here. And A is also related to D one, right?So D one get a score of 2. 5, that's a relevancy. So the, the one is actually the BM score, uh,relevance we talk about, right?So then it goes to the next document for D two,then D three, it goes one by one. And you see that if you areimporting list, it's gonna be long.

It takes you a lot of time too, likegoes through all the lists, right?That's why the performance not really good, okay?So, uh, there's definitely some smart wayto improve on that. So, uh, I won't talk about those details. But, uh, remember there's actually called weekendor WN So, uh, the overall goal for this algorithm, justto skip some of the, uh, tokensor skip some of the documents. For example, if I, if I already find one very relevantdocument, the e we can do is we can skip all the documents,which don't have enough. It, so, uh, I, you,you don't really need to understand that,but under that case, you'll see that D two is actually,the score is actually much, much smaller than D one.

So we skip it and directly goes to D five. So if you, if you guys are interested,there's actually a very good paper about this. This is a very, like, basic paper for all the like, uh,uh, uh, search. So, uh, I would recommend youto read more about this in the paper. Okay? Uh, but is there any like, drawbacks for double end?Uh, definitely there is, uh,otherwise we won't build it in, in, in, I think it,it is hitting to a lot of performance issueon many of the use cases.

Uh, if you have large top K, you have lar large chunk size,uh, you have a lot of long queries,but that could be the case in building rag applications. If you think about traditional search, the uly,people just give searching like three,four different K words. But on the regular use cases, you, you really ac crew,you're gonna be long because everyone's talking about youneed to give more context for your larger models, right?So people turns out to write long, longer prompt, right?So search, uh,becomes more like performance intensive, okay?So, uh, that's why in mill 2. 5what we viewed is something called, um,a hybrid index, right?So, uh, it's, it's a little bit different from traditional,uh, in index, uh, the two major differences. First of all, we have multiple differentindex implementations.

Uh, instead of just using word index,we also have a graph index. Uh, if you guys are familiar with Vector db, that iswhat we do for density meting as well. Uh, graph in graph, uh,like graph based index is actually pretty good. When you have long context, uh, when you have,uh, large top case. And the better part for graph indexes, you can utilize a lotof like, uh, machine learning things.

For example, you can do quantization on top of it. You can do a lot of like hardware, uh, like, uh,acceleration on top of it. Uh, we use cmd, uh, also the actual rating C plus. So, uh, the performance actually much better. So, uh, sorry, the, uh,the graph is a little bit like small.

Uh, but just to give you a very quick idea. So we did some tasks on many of the datasetand the results say that almost on the similarrecall compared to es. Um, memory usage for me is like, it's try, uh, it's, uh,is twice less compared to es,and the performance is two or three times faster. Uh, that's simplebecause we, we utilize the different quantization policy. So, uh, the, the recall gonna be lower a little bit,but, uh, you see like high performance growthand also like more memory saving.

Okay? Okay. So, uh, the back background seems to be, uh, really,sometimes it's gonna be hardto understand if you don't have the background,but don't worry about this, right?As a user, uh, usage can be very simple. So it just gonna be take three steps. First step is create your connection andor your scammer information. So, uh, from the code here, uh, you need to, uh,actually specify a function, help youto do all the embeddings.

Uh, here we have BM 25 functions, right?So, uh, you actually specify i the input fieldthat actually tags and alpha food is sparse, okay?And using the like PM 25 to do all the imbalance, right?So that's the first step. You have a collection. Second step is you have to build an index on top of it. So, uh, the index we are actually tryingto build is inward index. We also have the graph index version right's, a bunchof parameters where you don't really,you really don't to worry about it.

A third step is you need to insert your data. So your data is just beyond pure tax, uh,without any embeddings. That's a major difference between mill 2. 5and the, like, all the previous versions for, sofor previous versions, almost vector in vector alt. So you have to like do all the embedding generationby yourself, but right now with all the functions,it can help you to do all the embedding generation.

So you, you only need to offer the, like, food tax, right?So we, we also working on the dense embedding part. So, uh, we, we try to integrate a lotof like different embedding models and re ranking models. So, uh, using vector database is gonna be much easier. So we don't really need to worry abouthow you handle embeddings, right?So after data is actually going to, in search,you just search with another screen, likewho started AI search. Yeah.

So as that easy, you actually do, uh,the very basic demo for, uh, like sparse meetings,and you can also work sparse matting togetherwith density meetings for like better performance. Okay? So, uh, we just going to release this,uh, meal two five version. So just next week. So this is the first time I talkabout all the new features here. So, uh, feel free to talk to us.

Uh, this version will be very focused on like, uh,having more search functionalities. So not only for BM 25,but we also have keyword, uh, matching. So for a very long context, if you wanna feature out oneof the document with keyword meals, that is actually doable,we have multiple different index to, uh,improve their filtering performance. Uh, we also add a, a lotof different conservation help usersto reduce their cost when, when they tryto do vector search, right?So, uh, right now we be like preparing for meals 3. 0.

So we would also like to hear about your feedbacks. So in the, in the, in the new version,which is targeting on the early next year, we also haveinteresting features like UDF,like hot code data separation,and now a lot of people trying to do multitenancy. We actually have a talk about multitenancy. So, uh, one of the key challenge here is howto save the cost for a lot of, uh, code tenant. So that is one of our goal for three point zeros.

Yeah, but feel free to, uh, grab me,gave me more feedbacks aboutwhat is the feature you actually need. So, um, we, we keep improving your, uh, like for a lotof like AI applications, right?So, yeah. So one quick thing about who we are. So in, in, in case you don't know, so we kinda like the mostof why they adopted Vector DB on GitHub,open source for sure. Uh, under Linux Foundation umbrella.

So we have more than 400, um, contributors over three, uh,30 K stars. Yeah, we move very fastand, uh, over like thousand interprise, uh,like companies actually using Mirror as directed db. Yeah. So if you have any questions aboutus, feel free to contact me. Alright, thanks.

Excellent. So, um, James is actually here. So if, uh, and he already answered,looks like some questions,so if anybody has any questions about 2. 5or what you saw, uh, feel free to pop that into the q and aor into the, uh, the chat panel. Uh, and it doesn't have to be, uh, just related to viss 2.

So any questions you might have about your, uh,current implementations of vis, um, any kind of bugs,any kind of, uh, issues. And, um, yeah, we're, we're here to answer these questions. And, um, like all of our webinars, uh, all the recordingsand the decks are shared. So you'll get an email from us in shortly with, uh,all those details because I know,although this was a, a pretty quick, uh, video,there was actually, it was quite dense in content.

So as, uh, James mentioned, you know, we,we did have a mechanism to provide you with doing, um,of adding, uh, sparse vectors into Mil Vista 2. 4. But the key difference with two five isthat you actually bring in your text in raw format, uh,and we'll actually, uh, do the, uh, vector conversions,and you just basically can do your, your searches, um,similar to what you might be doing with, um, something like,uh, Ellucian based, uh, application. And so when you are setting it up, um, you know, you haveto make sure that, um, you don't just, uh,have a text field. You do need to make sure that you specify, uh,the function that's gonna go with it.

So we can do the, um, connect the, uh, sparse fieldto the text field, as well as, um, make surethat we actually do the, uh, vector embedding conversions. So a little bit different,but, uh, once you just understand, you know, that setup,then uh, you're, you're good to go, um,and can actually, uh, start doing the,your, uh, hybrid search. All right. So, um, James,are you able to talk?Oh, sure. Yeah.

Uh, thanks, thanks everyone. So,So let's, uh, let's start with the first question. How do you do hybrid? Uh, can we do hybrid searchbetween regular vector search and BM 25 full text search?Yeah, that's why, that's why we name itas a hybrid search, right?So the, the, the main reason of naming the hybrid isbecause we do both search on both BM 25 on both bars,spar many, and also density many. So, uh, the common pattern is that we searchlike top K from each of the, uh, battery index. It, it could be either BM 25, it could be, uh, dancing many.

And, uh, uh, once we get all the results from differentindex, what what we do is actually a re-ranking. So you can either use, um, uh,another model to do re-ranking. So what we recommend is always using, uh, either is, uh,while or cohere or some model from AWSor Google Cloud, uh, to do re-ranking. Or you can use like sim simpler, like rear ranking policy,like just, uh, uh, com, uh, using a weighted ranking. So combine every score together to, uh,to get like better ranking.

So, uh, we've seethat you use very simple rear ranking policy,you can see increase the performance, uh,because BM 25 is actually, uh, good. Uh, some of the use cases like Lexi search is really good atkeyword, but on the other side,Denman is really pretty good at context. So when you do search on both side, you can, you,you already find some different resultand it's actually, uh, uh, like, uh,when you combine all those together, help to, uh, like, uh,increase the search quality. Yeah. Cool.

Very nice. Um, all right, next question from Ray, uh,comparing BM 25 versus splayed,I thought Zills recommended using splayed. Uh, that's a, actually a good question. So we do see splayed is actually, uh, much better. A lot of the, uh, evaluate dataset.

Uh, but again, uh, split is still a pre-train model. So like, it, it, it do, uh, it do like search,but it's still pre-train. So under some of your use cases, as I just said,if you have, uh, some special terms, which is not, uh,very common, it, it's gonna be very, uh, trickyfor like pre-train model to, uh,understand those tokens or terms. So, uh, that is usually one of the use case we want to use,uh, q uh, keyword search of BM 25. So, uh, again, uh, though, uh, like 90% of the case,uh, like spread could be better, uh,but still under some of the cases, if you want better, uh,uh, way to explain the search result, if you wantto like control your search result, if you're under,like search result is actually, uh, relatedto your data distribution.

So BM 25 still gonna be, uh, very important. And on the other side, the good partfor BM 25 is you don't really need a model. So you don't need to like actual, uh, inference engines,but is still gonna be, uh, very slow. So if you have large dataset, uh, BM 25 still gonna be, uh,the most cost efficient way to search,But we still support, uh, splayed, right?So if somebody chooses to vectorize using splayed,they can put that into vis,Uh, exactly. So,uh, the way we do, uh, like PM 25 search is, is,is different from, uh, what a searchor what the other like traditional search engine do.

Uh, we still change all the document into a sparse vector. So that's what, that's whybecause we, we, we are all like vector database,so all we handle is just vector search. So, uh, by converting all the document into a,like a sparsing man, it actually gave us some benefit. Like we can do conation, we can do, uh, uh, some, uh,nearest neighbor search, uh,like appro pro nearest neighbor search. So that makes us faster compared to US search.

Yeah. Cool. All right. Next question. Um, from Bart, I'm using, uh, will there be supportfor array list metadata type in the new vis,Uh, just, uh,repeat question again.

What, what, what kindOf will there be support for array slashlist metadata type in the new mil? This,Uh, it's actually already supported. I, I think it's already supported back to, uh, 2. 4. So, and we do support to adding some indexon those red data types. So next step is we are gonna support map and set.

So that is also gonna be very importantbecause people want to store all the tag, uh, in mirror. So, uh, next step is gonna be mapand, uh, 3. 0 we're gonna export, uh,geolocation and probably date. Yeah,I think you, um, mentioned that we, uh,improve the indexing, right?For, uh, array,We do have another in word index,and we support a new index type called bid set. So, uh, what we see is a lot of people using, uh,no matter it's a tag or it's a array,so it's ly low cardinality.

So for example, it's just gonna be male, female. And using a bid set index is actually gonna be acceleratedbecause if you do in word indexand the inverted list can, uh, really longbecause, uh, you only have like twoor three different values, but using bid set, then you onlyto store two or three bid set. So, uh, it's also a part of the release for 2. 5and using the bid set, uh,what we see is under low cardinality, you, uh, we see like,uh, a hundred per a hundred times accelerating comparedto the original like, uh, imported index. Nice.

So Bart has a follow up question. So he's using, uh, haystackand it uses a raise for metadata. So, um, yeah, I don't think it, like, uh,James said we've already been supporting that. Uh, we also have, uh, integrations with haystack,and I think we even have a tutorial, um, as well, Bart,that you can find on our vis documentation. I think I need to check the code,whether like haystack al already,like the haystack integrator, uh, support, uh, array.

But yes, we already support array, so probably we needto update the enter, uh, like, uh, integration between,and we'll, we'll, we'll check. Yeah,Well, there you go, Bart. Thanks for, uh, asking that. All right, next, um, to be more specific, uh, wantingto create multiple search parameters, provide a reranand do a hybrid search between vector search forone dense vector columnand BM 25 full tech search for a sparse vector column. So, um, I think the previous question was, you know,can we do a hybrid search?So there, this is a much more, um,more details to the question.

Yeah, so I actually gave a quickdemo about how we do that. So, uh, basically speaking you, uh, you create a UDF,it is actually generative function to specify I wantto convert, uh, maybe one chunk,chunk field into a, a sparse manning. That's step one. Uh, step two is like you searchand you specify a re ranking function. So, uh, the search, you actually haveto give two, uh, two different things.

One is a, uh, query document. The other one is query embedding for, uh, likeas a density mannings. So then what happens, we, we search on both sparsand dense, and we do re rank. So, uh, that is just, uh, uh, the start of, uh,supporting, uh, raw data. So in the next release, uh,what are we gonna support is also integrate dancing embedmodels and re-ranking models into s as well.

So, uh, at that time, I, I don't think you really needto like, uh, worry about how the,like embed is actually converted. And, uh, all you need to do is, uh, like through your data,no matter is the image or it's, uh, contact, uh, into mulli. Yeah. So we do a lot of integration. Uh, like for Mullis, we definitely don't wantto do like inference or model serving.

So we integrate with a lot of like open source, uh,inference engines, uh, for example, Ray, for example,uh, a w bedrock. So, uh, they can do all the inference for, uh, for you. So for vis, we just throw all the datato those inference engines and gather embeddings. Cool. And then remind everybody on a rowhow many vector fields are supported in vis.

Uh, I think right now we have a limitationof four, by the way. I thinking of adding two 10. Yeah. So there you go. All right, next question from Charles.

When selecting GPU, oh, no, I'm sorry, from Q 10,does VISS 2. 5 support single field upsert operation?Uh, unfortunately the answer is no for now. Uh, the reason is, uh, we are still doing some, um,indexing, uh,because you can do single field, what we haveto do is actually haveto retrieve it out from the existing dataset, uh,then modify the one row and inert it. It is, it, it, it, you can still do it on your clean site,but, uh, it's not, it doesn't support automate observed. And also, uh, performance is not, uh, very good, right?So, uh, to, to, to do that, we do need a very, uh,efficient index on top of all the primary case.

Unfortunately, that is still under development. Uh, but, uh, this is actually the, on our roadmapand what we do on, uh, 3. 0, which is, uh, q uh,probably gonna be released to Q1 next year. Yeah. Cool.

All right, Charles, your turn is next. So when selecting GPU accelerated option to deploy,does BU automatically use GPUV rammfor a big dataset indexing and inferencing?Uh, we can do some consultation on the GPUs, uh,but, uh, right now there are like no control about how,how many memories we're going to, uh, use. Uh, if the dataset is actually, uh, large compared to the,uh, GPU memory, uh, then you could hit it into, uh, GP me,uh, like auto memory issues. So, uh, unfortunately we have to carefully design that. So we do try to swap that, uh, databetween GPU memory and CPU memory.

Uh, the result turns out to be not very good performance,actually have a great, uh, like big job. So we're still working with NVIDIA team, see what we can do. But, uh, right now, uh, to get very good performancefor GPU Index, we have to host every data into GPU. Okay. Well, this goes nicely into the next question fromNiles, which is, is viss optimized to run on CPU?What are the memory requirementsto run viss at enterprise or hyperscale?And are there any performance dependencies or impact?Uh, Viss is actually optimized run CPU?Yes, we, we do, uh, we, we can run on both GPU and CPU,but most likely we recommend to run on CPUsbecause most of the, uh, search of, we already like, uh,performance is already good enoughbecause CPU is still very expensiveand you have to host all data into GP memory, right?So we do, uh, a lot of organization on both, uh, inhaleand arm CPUs.

So, uh, what is, what are the memory requirementsto run mul with?So it's, it's, it's actually kindof like based on your dataset. So, uh, you can always start from, uh, like four core. It, it gives machine as a standalone. So it's, uh, it's, it is actually the minimal deployment,and you can also run it in, um, a Kubernetes environment. So the largest amount deployment we see is, uh, 50 billion.

So it's, uh, it's actually scales very well if you deploy ina customer mode, uh, with Kuberneteswith all the like cloud dependencies. Yeah. And, uh, easiest way to have great scalabilityand performance just using our managed service this cloud. So it's actually give you great TCO reduce your costs. Also, you don't really need toarrive about how to manage it.

So we do have, uh, both SaaS offerings,which is a fully managed service,and also A-B-Y-O-C, which deployed to your VPC. So if you have any security concernsthat could be like best fit your use case,But 50 billion vectors, Niles, that's a,a pretty impressive number. Uh, and, uh, so that is, that is actually waybeyond a typical enterprise, uh, uh, grade in my opinion. It's starting to get to, uh, internet kind of a level of,uh, of, uh, vector embeddings. Alright, next question from chiton.

Does this update allow creating more than four vectorfields in a collection?You kind of answered this, but, uh, let's answer that again. Yeah, we, uh, you, you do have a, a knob to tune itto like 10, uh, vector Fand, uh, what we, uh, we are actually thinking about adding,um, a, a array of imbalance, which is actually, uh,more like a tensor. So we do see use cases like, uh, we, we, people tryingto use the like, uh, late interaction models covert,and they're trying to put a seriesof imbalance into, into one field. Uh, that is something what we saw,and it's actually, uh, on our roadmap as well. But technical speaking, if you, if you are not using a,a like model vector model, I think fouror five fields is, is actually, uh,well enough for use cases.

Most likely you just gonna have twoor three vector, uh, feud, one probablyfor PM 25, 1 for spars manning like spray model,and the other one for, uh, density man,that, that could be good enough. Other, other than that, if have like more, uh,embedding models like you multiple different fields, I thinkthat is also gonna be, uh, very challenge, uh,about the performance because you, you're tryingto search on multiple different index. Yeah,And I mean, the beautiful thing about vector embeddings isthat they understand context, right?So that's why you can, you know,you can get such interesting search revol results fromall kinds of different queries. So, um, so yeah, let's not,let's not lose sight of that as well. Um, next question from, uh,ergo is can you vectorize data from my Gmail?Uh, yeah, I mean that, that, that is why people, uh,viewed, uh, rag applications.

Uh, this is the easiest way you can searchon your personal data. So there is actually a bunch of tools help you to do that. I, I would recommend you to see, uh, both LAMA indexand unstructured io. I saw they already have some, um, uh, connectors, right?So, uh, so you can pull all of data from different, your,uh, sas either your, at your Zendesk,your Gmail, slack channel. So they pull all those dataand they also have those connectors that help youto do chunking, do embedding,and the, we have the connector with them, so you can like,uh, put, convert all the data into embedding,sending dirt into, uh, those or,or mul, uh, that is a typical use case.

So, uh, do take a look at, uh, uh, both open sourcelong chains, probably one of the other. So do take a look at, uh, those open source software,say which one is your best,but it's, it is gonna be very easy to use. Excellent. Um,and then, uh, sala,I think we might be missing some context here,but you ask, is there a certificate?I'm not sure what that certificate is related to. So if you can give us a little more details onthat, then we'll answer that.

Uh, Ali asks, how efficient is it for searching images,image embeddings may mainly X-ray images,any special features for the medical imagesas you have demonstrated for the text?Hmm. Uh, yeah, I,I think medical imaging might be a little bit different. Uh, you probably need a special model, like, uh,embedding models is actually gonna be working. It has to be trained on your data. Um, for, for common, like for a lotof open source embedding models, they more like trained, um,some daily object like shoes, like clothesthat could definitely, uh, like I, I don't thinkthat would be help on your use cases.

So I think the first important thing is you need to find a,uh, like usable embedding models. If, if Luc you don't have it, then you haveto trim based on your own data set, right?But other, other than that, I, I don't think like,like medical imaging embeds has a lot of different, uh,from the other, uh, like, uh,Ima image search on vector DV side, probably oneof the things you can, you can have some tagsor probably do doing some filterings on, on top of, uh,those texts to help you to get better result. Yeah. But from the, uh, register side,that's what's gonna be very similar. Yeah, I mean, I think, um, you know, go to hugging face,look at all the different models that they have.

Uh, as James mentioned, you know, if it's some, if it's, uh,x-ray images that are very typical, I don't, you know,it's gonna be pretty easy. But if you have something very, very specific,I don't know about a kneeor something, you know, that is unusual, then yeah,you might have to, uh, uh,you might have to even build your own. Um, next question from Shilpa. Can vis be integrated with OpenAI?Oh, we actually already do, uh, integrate with OpenAI. So if you, uh, check the, uh, integrate, uh,OpenAI retrieval plugin, so we, we actually already thereand we have like one of the first, uh, uh, partners, uh,for record with, uh, with open ai.

Yeah. And there are actually a lot of, uh, like other waysto use, uh, OpenAI with, uh, meals together. Like one, one of the, uh, as I just said,if you take a look at long chainand alumni next, we, uh, they support both, uh, OpenAIand, uh, or Zills cloud. So you can already use, we use this together. Excellent.

And then we actuallyhave a Zillows cloud question. Is the maximum collection count allowed on Zillows?Does, uh, for example,a thousand collections affect performance a lot?What's a reasonable limit on the numberof collections in an instance?Uh, that's actually a good question. So, uh, right now they have a very strong limitation,I think's, one, for each of the ones you,we don't support 64 collections. Uh, yes, the reason is dueto some limitation reasons right now. Uh, the, uh, the collection numbers we support,which cluster is actually, uh, limited.

So, uh, we do see a lot of use, uh, users want to use, uh,multitenancy, uh, build multitenancy use caseswhere they want to have each questionor each partition as, um, as one tenant. Um, one way to fix that is, uh, check our, uh,document about partion key. So that is a, uh, recommended way for us to do,uh, multi-tenancy. Uh, but yes, we, what we, we actually working on a projectto improve the, uh, collection numbers we can support. Uh, we, we've been already, uh, able to support, uh,more than 10 K collections,and, uh, each collection has more than, uh, a thousandpartitions on one cluster.

So, uh, after we optimize on that,probably this is gonna be released in the next oneor two months, then we probably goingto 10 times the current collection, uh, limitation we have. Yeah. Excellent. Wow. A lot of questions.

Uh, and I wanna remind everybody that, um, uh, you know,we have a di uh, discord channel. You can find James there. Uh, you can also, uh, put questionsand, uh, issues and GitHub issues. We also have weekly office hours now,and so we're always happyto go really deep into your particular use case with vis. And, um, and if James isn't on there, then oneof his engineers will be on there.

Uh, and, uh, in addition to making surethat we answer your questions, we also realize that, um,it's an opportunity for us to make some changesto our documentation so we can make these, uh, you know,eliminate the, uh, questions, make it a lot easierfor y'all, just from the, uh, get go. So we'll keep the lines open for a couple more minutes,for a few more questions to, to come in, but I, um,and I see a couple another question come in, butbefore I go into that, I wanna, um, let you know that, um,you know, as you can tell,James knows everything he is definitely Mr. Vis, but it'sbecause he, uh, spends a lot of time with users like y'all,and he really enjoys it. So please don't be shy, like go to GitHub, go to Discord,really reach out to him. Uh, it's really important for him to understandwhat you guys are doing, what is confusing, what is missing,so that he can really define the roadmap appropriately, uh,and make this into, uh, a really solid, which it already is,open source, uh, vector database.

Okay, so Nitty asks, what are other LLMs that we can usewith VISS two five, other than OpenAI?Can we create assistance like OpenAI assistantusing Viss two five?Yeah, I think, I think, uh, with launchingor my next create like assistance is actually gonna bevery, uh, very simple. And, uh, best part is everything is open source,so you don't really need to pay, uh, to anyone, right?So what kind of, uh, larger models we're using?I, I'll say for embedding models, I have to mentionour friend Warrior AI and Frank. So I, I do like their embedding models the most. And for generation models, I think right now is, um, um,my favorites gonna be quality. So it's either open eye or quality.

Yeah, so there's, uh, a lot of LLMsthat you can, uh, use Nitty. Um, and, you know, it's not that we're,we wanna prevent you from paying the LLMs,but you know, just think about when you are in development. Um, you know, and we've done it ourselves. You're gonna be making a lot of calls to these LLMsand you really don't wanna incur that extra cost,not in the development phase. So our recommendation is always, you know, tryto use open source LLMsor open source embedding models, uh, so that you can just,you know, perfect your application.

Once you're ready to go into production then,and you really feel strongly about using a paid LLM, thenthat's when we recommend that you, uh, turn that on. And then even on top of that, we actually have a numberof blogs that talk about, you know,you can actually use lots of different LLMsfor different, uh, tasks. So you can use, uh, an LLM, uh, to be a judge, for example,of your, uh, results to, to do an evaluation. And you don't have to use the same LLM throughout. So, uh, be smart.

Uh, nobody has unlimited resources,not even OpenAI or Apple. So be really smart about how you spend your, uh,your development, uh, dollars. All right. Wow, that was really great. Lots of questions here.

And uh, James, thank you. Like always, uh, Mr. Melva here knows everything and, um,but you can see his enthusiasm for the project is, uh,is super, uh, evident. And, um, and it'sbecause of users like y'all who are, uh, contributing, uh,in code, in questions, in docs,and just in using, uh, Melva. So keep it up.

Uh, he loved to see, uh,what you guys are working on. So thank you so much for your time today. And then, oh, Rob, if you're still on,we'll take a look at your video. Awesome folks. Recording will be sentto you a little bit later.

Thank you. See you guys.

Meet the Speaker

Join the session for live Q&A with the speaker

James Luan
VP of Engineering at Zilliz
James Luan is the VP of Engineering at Zilliz. With a master's degree in computer engineering from Cornell University, he has extensive experience as a Database Engineer at Oracle, Hedvig, and Alibaba Cloud. James played a crucial role in developing HBase, Alibaba Cloud's open-source database, and Lindorm, a self-developed NoSQL database. He is also a respected member of the Technical Advisory Committee of LF AI & Data Foundation, contributing his expertise to shaping the future of AI and data technologies.

Dense Embeddings != Complete Search - A Sneak Peek of Milvus 2.5

What will you learn?

Topics covered:

Meet the Speaker

AI Assistant