Events
Optimizing Your Milvus Instance

Webinar

Optimizing Your Milvus Instance

Zilliz Webinar - Zoom

Join the Webinar

About the Session

Embarking on your vector search journey with Milvus is just the beginning. Did you know that there are numerous strategies to optimize and elevate the performance of your instance? Join us as James Luan, our VP of Engineering, shares valuable insights and expert tips to unlock the full potential of Milvus. Discover advanced techniques, best practices, and performance-enhancing measures that will take your experience to the next level. Whether you're a seasoned user or just starting, this session will enrich your understanding and proficiency with Milvus. Take advantage of the opportunity to maximize your Milvus instance's efficiency and capabilities under our engineering expert's guidance.

What you'll learn:

Milvus best practices
Strategies to optimize and elevate the performance of your instance
Expert tips from the VP of Engineering

Transcript

Today I'm pleased to introduce today's session,optimizing Your Vis Instance and our guest speaker. James won our VP of engineering. James won is the VP of engineering at Zillowwith a master's degree in computer engineeringfrom Cornell University. He has extensive experienceas a database engineer at Oracle, he Vig and Alibaba Cloud. James played a crucial role in developing HBase,Alibaba Cloud's open source database,and Lindor, a self-developed NoSQL database.

He's also a respected memberof the technical advisory committee of the Linux Foundation,AI and Data Foundation, contributing his expertiseto shaping the future of AI and data technologies. Welcome James. So for the participants today,we're gonna be covering this blog,the Okay. Where's everyone? Here we go. We're gonna be covering the contents of this blog.

So the first thing we're gonna be doing today is we're gonnabe talking about the versions, the search, uh, the memory,how things get inserted into vis the configurations,the logs, the clusters, the documentation, um, howto deploy vis, and what happens when you delete, uh, data. So James, um, we'd love to get an introduction from youand then we can dive into the topics. Sure, sure. Uh, thanks Eugene. Uh, so, and, uh, glad to meet everyone here.

So, uh, today I, I, hopefully I,we can get some knowledge about how tooptimize your MO instance. So, uh, we, we actually get a very long list about how,how you can improve the performanceas well as reduce the cost. So, uh, might not just start from some questions,so we can go into some details. Sure. Yeah.

So one of the most frequentlymention topics in our community is the version. And, uh, recently we've released 2. 4, andbefore that we had 2. 3. And so we've had quite a few, uh, different versions,and it would be great to kind of know about, you know,what the differences areand, um, you know, what are the new features that are, are,are being added and what,and, you know, are there breakingchanges and things like that.

Yeah. I actually, for the communities,it is actually moving very, very fast just as, uh,what the A IGC communities is, is actually doing. So, uh, we start to optimize performance from Mills 2. 2,which is like back to, uh, 2021. Uh, by that time we did, uh, uh, a lot of like,optimizations, especially on searchand, uh, uh, removed, uh, some kind of bottlenecks in the,uh, degree cluster.

So, uh, for now for, uh, you're trying to use ware,I would recommend to just start from at least, uh, two to17, which is the latest version of two two. And, uh, it is probably the most stable version of Milless. Right. But now, uh, I mean, most of your costs,if you a new user for milless,I would recommend directly to use. Two.

Three is our latest, uh, stable branch. Uh, for, for two three, we actually introduced a lot of, uh,performance improvement on featuring, uh, in,and also like, uh, improve the memory replicas. So you could have multiple replicas to, uh, improve your,uh, performance once you, uh, you find like, uh, if,if you don't have like enough crib per second,just add more replicas. Yeah. And, um, also we introduced some new kindof index.

One important index is, uh, Google Scan, uh, compared to,uh, the, uh, traditional fast index,actually like five times, 10 times faster, even comparedto agent double,it is much faster under some of the use cases. Like if you have large top case, if you have, uh, like, uh,featuring, uh, UA scan is going to be, uh, faster comparedto all the graph index. Yeah. We also did a lot of, uh, optimizations of the,on the, uh, graph index, like Asian sub. We have, uh, we, we add a lot of functionalityto improve future performance, uh, including, uh,change the graph, uh, structure, uh, has a lotof optimization on the ex execution path.

For example, if your featuring rate going to be a very high,then we have this, uh, uh, we, we directly go to, uh,bruteforce rather than use, uh, graph index. If you have like, uh, me medium future rate, then we, uh,have, uh, some, uh, uh, instructions to help, uh,help you to, uh, improve the graph con connectivity, uh,using those kind of features. Uh, our filtering performance is actually 10 times fastercompared to, uh, the 2, 2, 2 word. So, on most of cases, if you are interesting about howto improve the performance, use two, uh, use two, three. And another interesting part is, uh, we, we introduced,uh, the, uh, GPU index,which is actually we co-work together with the Vidia team.

Uh, the current GPU index we be working with is actually,uh, open source in, uh, rapid, which is, uh,from, uh, Vidia team. Right? So current index we're using is GPU fl, uh, which is,uh, much faster compared to a CPU version,especially when you, uh, when you have largerbatch batch of queries. Yeah. Uh, but if you have very small queriesor if you care about performance, maybe the current wordof GPU index is not the best. But on the two four n next next month, we're goingto have a newer GPU index, which is, uh,which is called Cargo Index.

It's actually a GPU graph index. And it's, uh, actually much, much faster com, uh, from our,uh, task is like five times 10 times faster. So, uh, if you want to use GPU,if have very high throughput, uh, the missing out 2. 4,so it's going to be released, uh, uh, mid next month. So, uh, it also has a lot of different, uh, optimizationson how you can do batch queries, um, GPU index.

And also, uh, it also supports SPARSING meetings. So if you looking for having better search qualities, then,uh, waiting for two four, we've got a bunch of newfeatures help you to improve the search quality. Oh, okay. Very exciting. So in two three, we got a lot of new indexes including scanand the GPU index,which we created in collaboration with nvidia.

Um, and then there's disability to add more replica, easierfor scaling, and there's some increasedperformance through optimizations. And then in two four, we're gonna have a new GPU Intax,which is gonna be really good for high throughput use cases,and we're gonna have, um, sparse embeddings,which will help tune some of the, uh, searches. So that's really exciting. And that actually is really goodto bring us right into the next topic,which is about search. So can you talk to us a little bit about the types of searchthat we can do, uh, in vis, um,there's a search and there's a query.

Um, and maybe talk about some of the things likemetadata filtering or the new, um, the new group by feature. Yeah. Just, uh, missing one part, right?So a lot of users ask me which kind of,which newest version I'm actually recommended. So my answer will always be using the newest mul versionsas, as always using newest. Yeah.

Because, uh, we, we actually did a lotof tests before we do release. So unless for very early version, if,if it's a release candidate,then maybe there's some ity issues other than thatfor our stable branch, you all will just doing some bug fix. So the newer version is gonna be more stable as wellas like faster. Yeah. Okay.

So back to search, right?So the very basic search functionality of Vector databasethrough all is, uh, to offer is just, uh, in search, right?Is is actually going to be offeredby every Vector db mm-Hmm. Uh, it is, it offers like, uh, uh,something differentbecause first of all, it's, it's going to be scalable. So, uh, we actually are distributed Vector db. So we, we did a lot of optimization on how, um,con concurrent search on different nodes can happen. We did a lot of optimization on load balance, make surethat each of the nodes holds part of the data,so you don't need to worry about how you scale your data.

Yeah. Uh, other than that, uh, we also introduced a lotof different new functionalities. For example, in two, in 2. 2, we actually have anew search colleague, uh,search function called Range Search. So it gave users, gave some distance limitationsas a eye on the, to find all the vectorswith distance larger than zero point, uh, liter 0.

7 mm-Hmm. On, on their co metric. Then you get, uh, a large number of, uh, uh, vectorsthat is extremely useful when you find, we wantto find abnormal datas. So the abnormal data is,could be very large under certain use cases. Right.

So, and, uh, research usually work togetherwith ation since the result. Uh, unlike top case search, right. The, uh, research, you could re return a very large list. So we have to iterate, uh, the resultsthat rather than get a badge from RBC, otherwise,otherwise your memory gonna be auto memory. Right.

Right. So, uh, another interesting feature we are going to, uh,have in 2. 4 is actually, uh, group by search. So what, what it actually do is, um, especially usefor under, uh, rag use cases where you have a lotof documentations, each documentation youhave bunch of chunks. Mm-Hmm.

When you try to retrieve under some kindof use cases, you just want to have top key document insteadof, uh, have top K chunksbecause under, uh, some kind of use cases, right?When you have, uh, if you retrieve for top top 10,and all the top 10 related chunks are just fromone articles Mm-Hmm. Or from one document. Mm-Hmm. Which means when you, when you do retrieve,you get only one document and you don't have any diversity. Right.

And you don't have a way to do more rankingsto find different, uh, possibilities. So with, uh, uh, with, uh, group I, what could happen is,is, is more like one topic cate categoriesor find topic key document instead of only fun chunks. So this is gonna be very helpful, uh,for both search recommendations you want. You definitely want more diversity. So, um, another interesting featurefor search will definitely be have research, uh,compared to the original, like, like there areactually two kind of, uh, hybrid research.

One is just doing fuel drinks,and the other one is, uh, we can, we can hybridsearch together with spars meetings. Mm-Hmm. So, uh, start, uh,support fuelings from 2. 0. Uh, we support almost all kinds of scatter datas including,uh, strings, including numeric numbers.

Now, we, we, we also support more high level data structureslike array, like json. With, with the JSON support, you can have all kindsof, uh, dynamic schema. Right? You don't need to define your schema in advance. Yeah. And on 2.

4, our major, uh, breakthrough will be, uh,we actually support inverted index on top of, uh, all,all the, all the, uh, uh, SC data. Uh, we actually, uh, using, uh, 10 index, which is actually,uh, excellent Rust Library. Uh, very similar to lu. Uh, very similar to lun. We support all kinds of, uh, in word indexwith the new inward index.

We also support some very interesting operations such like,uh, such like, uh, uh, regular expression search, such asfor the, uh, for the queries. So those kind of operations just, uh,make mules more like a traditional database. You can have all the scatter functions together to workwith the filtering search. Yeah. Wow.

And, um,and have on the other side, right, the new have research,which means we can do both dancing embedding since Sparkmini search, and we can do a run a rankingmodel on top of all those. So, uh, on 2. 4, we introduce a new, uh, that are more,uh, data type, which is sparsing embeddings. So unlike dancing battings, sparsing embeddings, uh, uh,is actually has larger dimensions,but most of the dimensions just zero. Uh, to generate sparsing battings, you can use, uh,model like displayed or BM 25.

So Sparsing battings, compared to like dancing, batting,sparsing batting is, is gonna be u really, uh, much,much better at finding some specific details like, uh,some, uh, based on some numbersor it, uh, it, it, it also like, uh, very good at,uh, auto domain search. So one typical way we can use spars meetings, we can,we can work together with, uh, with, with, uh,density embeddings Mm-Hmm. We search, uh, from both of the embeddings, uh,get maybe top, top key result,and we do a re-ranking using a crossing kohler model, like,like bird or maybe some, uh, like coherent,uh, ranking models. So that gave you more accurate result. So from our test result, using both denseand sparse meanings together with a re ranker,you can get maybe a 3% to 5% increase on the,on the, on the search accuracy.

Yeah. Okay. Wow. Okay. A threeto 5% increase with re rankings on that.

Um, that's pretty interesting. Uh, we have our first question from, uh, the audience. Uh, so this was about the versioning,and Alexandra asks,do newer viss versions introduce breaking changes inthe pie viss interface?I'm using an older version of viss today,and I'm concerned that if I upgrade, I'll have to fix bugsaround changes in pie mils. Yeah. So we, we are actually very conservative to,to bring any breaking changes.

I, uh, last time we bring in some breaking changes,I believe is, uh, from 2. 1 to 2. 2. I'm not sure like how old the version you guys are using,but for adding version of 2. 2,uh, it should be safe to upgrade to 2.

3 and,and to be mentioned that, uh, we actually have, uh, uh,loading upgrade features from 2. 2. So, which, uh, I mean, very later versionof 2. 2 is not 2. 2220,but probably 2, 2 16, some, some kind of versions.

So with the loading upgrade features, uh, you,it will be like very free to upgrade. Know that, know data need to be changed. No, API change and, andalthough we did a lot of API refinement,but u role, we just keep all the old APIsfor maybe two major versions. Yeah. So, so most of the cases you need to change it.

Yeah. Okay, cool. So API's versions lastfor about two majors versions,and the last breaking change was 2. 1 to 2. 2.

So Alexandra, if you're using a newer version than 2. 1,you should be totally fine. Um, and if not, you will haveto do a little bit of refactoring. Um, we have another question from the audience. Um, who is responsible forvis dash cli?Oh, we actually have, um, I think,I think two committers work working on Mil.

Any questions about CLI?So, so if you, if you having any like, feature requirementor if you're finding any bug, like, feel, feel free to goto GitHub and, uh, give us a issue. Yeah. I, I also like go through mural's, uh, main reposas well as the pen repos plus, uh, mul. CI. Yeah.

So,and, uh, fun fact is, uh, mul Cell,I is actually the same author as, uh, uh, our tool. I'm, I'm not sure if you, how, how you guys are, uh,familiar with, uh, the DUI two formulas. It is named two. And, uh, if you guys don't know it, you,you should recommend to use it. It's a bit awkward to use.

So, and, and any, any like reasons. Yeah, we also want that, any like,useful feedbacks so we can improve. Yeah, that's the great part about having these webinarsand having users on is definitely getting to see this kindof user feedback and see what's going on. Um, I see another question on here. Uh, for loading PDFs into vector databasesand applying chunking, um, I think thatthat might be an ENT entire, well,that is an entire talk in and of itself.

So, uh, tasing, we will send you, um, a linkand some resources on that for now. Uh, let's move on to the third part of this. The third question that has been the third most populartopic that's come up is memory. What is the trade-offs between performanceand accuracy and memory?Yeah, so I'm not sure how you guys are familiar with, um,the new CEP server is, is now like exactly the same oneas a traditional database where you looking for, uh,trade off between consistency, availability,and uh, also network prediction, right?For, but for regular database, you also have anew trade off, which is a cost accuracyand, uh, the performance. So, uh, we, we offer different kind of index,uh, into VUS based on what kindof use cases you, you are trying to use.

For example, if you, if you looking forhigh throughput Harry call, uh, index, I think,I think GPU Index Index,especially the new cargo indexshould be what you're looking for. So we we're also going to release, uh, this, uh,GPU index together with the A VI team on this,this year's GTC. So it should be just happens on March. So give us some patience on that. So on the other way, right, so, uh, if you are using CPU,then the most high performance index will be, uh, HHW.

And, uh, if you don't worry too much about the recalls,you can, you can come do some combination with HWand quantitations, using HWSQ can help youto both save memories inand also, uh, improve the search performance. Yeah. Um, onand, uh, for most of the rack users, uh, they'll be like,prefer to have lower like memories, which means like,uh, less calls rather than just have betterthroughput and recall. So under that case,we actually have two different, uh, recommendations. If you, if you care about recalls then,and you have high performance disc,then disconnect should be what, what you'd be, uh, lookingfor is, is actually help you to reduce your memoryto 10 times compared to, uh, in-memory index.

And the mills is actually the only Vector DB support,uh, like, uh, uh, on this index. So far as, as far as I know, I, I'm not sure like, uh,because I, I didn't do like update,but so far I think Mills is the only one. So, uh, what this gonna do is we actually maintain someindex in MAM memory,but the index is, uh, uh, ole compressedor contentized, and all theoriginal data will be directly on, on the disc. So what happens is, uh, on, on, when search happens,we are just using the index in memoryto find some candidatesand we use, uh, all the candidates on, on thisto do refinement to make sure the recall is gonna be high. Yeah.

Uh, but the problem for key is that, uh,index building is actually a little bit slow. So, uh, issue, uh, definitely have more, uh, index node. Uh, if you have very frequentexertions, that is one problem. And the second problem is it, since you, you're goingto do refining for all those candidates. So you, you need very high performance disc, uh, it, it hasto be a VME disc at at leastas a 50 K should offer 50 KLPS.

Other than that, your are search performance is goingto be very low. Yeah. And other, other side, if you underst save your memoryand you, and if you don't care about the recall too much,uh, for example, if you have some, uh, uh,specific dataset, uh, for example,the open eye dataset, right?Even if you do condition,you should still get very high recalls. So under that use cases, uh, try to use, uh,a VF index with, uh, uh, sq or product position. It help you to reduce your memoriesand also, uh, improve your search performance.

Yeah. And finally, if you are tryingto use the new open AI embedding models, uh,we actually have a new functionalityto reduce your vector dimensions. So it actually don't affect the, uh, recall too much,but, uh, uh, using the, uh, you can,you can just reduce the vector dimension from, uh, like,like 2K to maybe 2 56 or uh, five 12. It reduce the memory like four times. And, um, you,you should just take it under most of the cases.

Wow. Okay. That's pretty cool. Um, I want to, I wanna ask about the, the disc thing. Just one second here.

So you said that in order to do disc disk,NN you need a disc with, uh, that's an MVME discand 50 K, what was the 50 K that you needed?50 K-I-O-P-S. Oh, oh, okay. 50 K, like second. Oh, okay. Oh wow.

Okay, cool. Um, so, you know, it sounds like, uh, for the memory stuff,there's a lot of different options, right?Me Vis is the only one that uses disk NN um,and having that, you know, uh,in-memory data at one 10th is, um, very interesting. And I think that that will offer a lotof really good options for people who don't have a lotof memory, uh, to work with. And it'll be, you know, um, very fast. And the IVF PQ SQ stuff will also help.

And, uh, we have another question. Um, so Alexandra asks, what advice do you havefor tuning the VUS database setup, uh, in termsof the number of pods, um,horizontal versus vertical scaling of nodesand the persistent di disc size?Yeah, we, we, uh, we actually have a calculator on,um, mill style. So I think the first step is Euro is always using the codo some estimations. So the rough automation is that for eachof the 80 gigabytes of memory, it could just hold, uh,ar around, uh, 15, uh, 1. 5 millionto 2 million, uh, 7 68 dimensional vectors.

So, uh, based on, based on this, um, estimation,then you, you'll have to know like how many osememories you, uh, you needed to have. So, and based on that, uh, if it's more than 32, uh,gigawatt memories, we would just recommend to, uh,have this, uh, distributed version of bu uh,rather than, uh, standalone. So, uh, you, you can just split all yourto a three, two gigabytes pod. And, uh, mo on most cases, you just need one proxy,one, uh, mixed coordinatorsand one, uh, data node that that should be good enough. And you can, you can just scale upunless you do very frequent in their in insertion division,or you have, uh, network problems is, uh,for example, you want to retrieve a lot of different views,then maybe proxy network be becomes a bottleneck.

Then you have to think about, uh, other scale up your, uh,po uh, proxy or just, uh, scale out. So, and,and for index node, uh, the only trick is just, uh, goingto be, uh, if, if you are using like index, like, uh, uh,other flight, uh, those should be building index, uh,very fast, even for h and w, right?So it, it should be very easy to catch up. But for this, things are gonna be a little bit different. So, uh, you probably need very large index po. Uh, it, it actually have to tune a lot for, for thisto make sure that they has a reasonable performance.

You, you should choose one segment size, you tunefor your index pump size, uh, you should, uh, tune infor your disk as well. So the easiest way to, to use a disk index,just using this cloud with our, uh, capacity optimizing it,it actually offers faster speed of, um, of, uh, uh,the, uh, compared to, uh, dis. And we also, for, for those call, we also have,uh, index build poll. So users don't need to worry about how large is the index. We, we don't charge for insertion data in index build.

The only thing to worry about is how many, how manycriminals I need to have. But for open source users, yeah, you, you probably have needto do a little bit benchmark. So the recommendations you, you needto have your monitoring me metrics. So there is actually a monitoring metrics, uh, likehow many index tasks are there, not runningor waiting the queue, if,if the queue is actually very large,is your less no means you have to scale your index. Okay.

So, uh, for users of the, um,the open source, they've gotta do quite a bitof tuning if they're gonna use disk, a NN. Um, but Disk N is basically what we're implementingas the capacity optimized version in Zillows cloud,Uh, for those cloud. It actually in-house index, it's, is not,it's not purely like, uh, uh, decentor any other kind of open source,but, uh, for, for capacity,it's actually very similar to diskin. Yeah. Okay.

Okay. Oh, that's very interesting. Okay. And so I think that, um, this actually kindof leads us well into maybe, uh, skipping through the insertand kind of touching on what's,what you were just talking about, which is this, um,this cluster and this scaling stuff and, uh, clusteringand deployment seems to be something that, uh,you were just touching on. I think it's kind of natural to move into that a little bit,and maybe you can tell us about the difference between, uh,Novus standalone and Novus cluster.

Uh, maybe when you should switchand, um, what it would be,what the difference might be in deploying, uh, eitherof these two versions. Yeah. So I think for most of the starters, uh,standalone will be, um, the, uh,the initial version you should, you should try,because it is actually very easy to deploy, uh, as well, uh,as well as, uh, it's, it is just a, a single docker,so it should be very easy to bring up on your laptop. So, uh, feel free to, uh, like do anyof the tests on the standalone cluster. So, uh, there are actually two different signals when,when I would switch from a standalone version to to,to a cluster version.

Uh, first reason is if you have a large amount of data, soI think, I think more, if you have more than 10,10 million datas probably 20 million datas, uh,then it's probably a very reasonable number of, uh, datasthat the issue switch from standalone to, to,uh, uh, cost version. Yeah. And, uh,second signal is if you want very high availabilitybecause, uh, for standalone clusters,although we support stand, uh, like, uh, uh,primary standby, it's going to be, uh, take longer time toofor the cluster to recover. But they have, uh, cluster versions, uh, with the helpof Kubernetes is going to be much easierto have high availability. And, uh, anytime you wantto scale is, is gonna be much easier.

Yeah. But, uh, uh, I think for, uh,stand, uh, and there is also a way to switch from, uh,standalone to cluster. So even if you start from standaloneand you want to, um, migrate to a cost aversion,there's still way to do that. So don't worry too much about it. Yeah.

Okay. Um, I think that, you know, uh,I have a question here, which is, how doesthat work from the Mil Vista standalone to cluster switch?What kind of, uh, what happens,let's say beneath the surface?Yeah, so for, for mil standalone,there actually two mills. One is local mode, and the other one is remote mode. So the difference is, uh,a if you guys know a little bit details about mills, we,we have a lot of dependenciesfor mills itself is actually status. Also, the data is, is stored into three par,uh, three dependencies.

First of all, it's your meta storage ly. We use, uh, ETCD. Uh, second one is your, uh,message queue, which is, we use MessageQ as a WL. So for standalone, uh, Uly user, what users using is,uh, log db and for, uh,distributed version, right?We need a distributed, uh, log storage, which is zero,like Kafka or psa. And the third one is, uh,object storage is stored all the datas.

So the easiest mode, the local modefor standalone all what they do is they store all the dataon local disk. So it's, it's gonna be very hard to upgrade to,uh, different version. Uh, the one thing you could do is you can do a backupand you can, uh, re uh, recover the data, uh,to a new cluster, make sure the data can be migrated. Yeah. And, uh, for the, uh, standalone clusterdeployed with all those dependencies, which I means you,if you deploy, you, if you're trying to deploy, uh,S3 storage and also with EDCD, then it's goingto be much easier because all the, all, all the data, uh,all the, I mean, all the functionality of, of standaloneand, uh, costs are, are exactly the same.

So the only thing you need to do is, uh, you haveto change the deployment, uh, from one, one node to,uh, multiple replicas. Yeah. That, that should be the,that should be too hard to do. Oh, okay. So the, so if we want to switch from the,the local standalone version, all we needto do is basically back up the data,which is typically stored locallyand migrate it to the new cluster version.

But if we're, if we're using remote standalone alreadywith all of these dependencies deployed,we can simply change a few parametersand, uh, use the cluster version. Um,Yeah. So we have allthe dependencies just going to be in place,so make your life a little bit easier. Okay. Wow.

Well, that's greatbecause, you know, then that means that if we wantto be scaling vis, we can, we can, we can even start justto use Melvic to do some experimentation at the beginning. Um, and we can do the, the local version. It'll be really easy. It'll be really simple to use. And then when we're ready to scale, it's very easy to scalebecause we can just kind of, you know, uh,have this natural migration.

There's not really any changes in the waythat we interface with Novus. Um, I think that this actually also, uh, kindof leads naturally into the point about,um, logs. You said, you mentioned log queues,and, um, there's also other logs in viss. So why don't we talk about some of the logsthat are in viss, like the log levels as well as what, uh,or how the data gets logged or what logs of data. Yeah, I, I mean, there are actually two, two kind of, uh,logs, but they, they're actually very confusing.

So if you're talking about the log log, which means, uh, uh,the all, all the, all the, all the datas, you'll,we name it, uh, double a l redhead log. Okay. So, uh, for all datas into, uh, C KAor, uh, for, for standalone, uh, storage into a log db. So actually your, uh, data to,to make sure all your data persistent. So when red happens, uh, proxy, directly write data into,in, in, into the logs,and, uh, all, all those logs are actually consumed byand, uh, data notes to, to certain request.

So one, uh, onemajor change that is happening in the C communities isactually we are, we are refining the log storage,uh, from 2. 0. We use Kafka process as a third party,third party dependency. But a lot of, uh, users are just, uh, uh, complain that, uh,this is too heavy to maintain both, uh, mills clusteras well as, uh, log storage. So we are actually working on a new, uh,distributed log implementation is actually, uh, faster.

And, uh, uh, it's actually part of the, so, uh,we'll just remove this dependency in the future,but it is, it is actually, uh, uh, still, uh, uh, uh,a long time plan in the NA map. So it could be happens, uh, Q3 or Q4 this year. So, and, uh, if, if you're talking about, uh, the loggings,uh, then, uh, by, by default they are they actually using,uh, info, uh, info level?So, uh, that, that is like the most recommendation, uh,logging level, uh, uh, we want to use. And, uh, if you want to put meals into production, uh,one recommendation is always having a logging system. No, no matter is, uh, lock key, no matter is, uh, some kindof, uh, uh, public cloud when there's offer no matter is,uh, uh, ELK,but whatever you have, uh, since is a cognitive system, we,we put all those logs into a standard stream.

So you, you, you will either need to change some configsto put into files,or you can just use any kindof the system I, I just mentioned. So make sure that you have all the logs,because once you hit into any trouble, all the, uh,maintainers or committers in the, in the community, they,they would like to help, but they definitely need more logsto investigate into, into all those issues. If, if you forget to, to, to have those logs ready,you already run into the issue. One way to do that is we offer, uh, export logs script. So with the script is, uh, is actually able to select, uh,logs for maybe, uh, last 10 minutes, uh, based on likewhat kind of issues you have.

If you have very frequent, uh, logs,then there might not be enough information,but they see gonna be helpful for some of the cases. We can, we can, we can, we can invest help to US gateand help you to fix your cluster. Okay, cool. So the, there's actually two typesof logs in Novis when we talk about novis. So we have talk about two types of logs where have,be traditional like debugging info, kind of log levels.

And then we've also got what we call the write ahead log,which is also the way that we, um, move data into vis. And these are two different things. And, um, we're also gonna be updating and,and creating our own write ahead log, um, that will,that will be faster and more lightweight. Um, and then when it comes to the logs, the regular logs,uh, such as these levels of, uh, you know, info or debugor whatever, um, even if we don't have the right logsinitially set up in our production environment,what we can actually do is there, we can use one of the milsprovided scripts to collect the logs over the last 10minutes and export that so that we can post that on,let's say, GitHub discussionsand get answers from people. Um,And to be mentioned that for, for, for S 2.

3, we actually,uh, support a new feature to dynamically change the,uh, log levels. So it will, it will it also help youbecause, um, it, all the debug logs is actually, uh,very, very detailed. So you haven't any issues,but, uh, uh, for, for someof the reasons info log don't have enough information,you can just then make a change the log level to, to debug. Okay. That makes sense.

Um, and then I think we can cover the, the next two ina very similar fashion as well,which is the insert and delete. People wanna know, uh, how do we insert data?What kind of format of data do we need,and what, what kind of format of data can we insert?What do we need to insert data?Uh, and then the deletions, um,what happens when novis delete data?And I think this would also a good timeto talk about upsert as well. Yeah. So, uh, when, when Mill 2. 0,or when Mill was created, right?So one of the important goal for us is to support,uh, vector, uh, streaming data.

Uh, because for all the, uh, traditional vector index Act,Uh, uh, uh, it is that,so you have new datas, right?You want to do some, uh, vector, uh, change someof the vectors or delay some of the vector,you have to rebuild the index. So one of the importantand, uh, meal support is, uh, uh, streaming dirt. So the, we, because we have multiple different,uh, index tab, right?For some of the index tab, for example, GPU index, uh,there's no way the all index is just gonna be immutable,and there's no way to insert or delete into it. So the we we created is called, uh, growing segment. So we, we actually read that into two part of second part,uh, first part is growing segment, it's actually mutable,uh, so you can, so we caner into it,but, uh, the search, we also build index on top of it,which we call it as, uh, bing log index.

But the search, the, the index is not efficient, asefficient as the other like, uh, index types like, like HWor wealth index, right?So, uh, we also have the, uh,once growing segment become larger, then we sell itand the soon to index node to build,uh, uh, second index. Then we call that his historical segments. So when search happens, we just do a merge. So we search from growing segment as wellas search from all the historical segment and do a merge. It's pretty, pretty much similar like, uh, uh,log structure merry, so merge all,all the data from different segments to get a,uh, complete result.

So when delete happens, uh, we actually,because mules is a, is a, isso whole disagree storage and the competition. So there's no way we can,Oh,I think we're running into some technical issues here. Um, okay. Uh, I'm not really sure what's going on. I think James has run some technical issues.

So, um, you know, if I'll,I'll try to answer some questions. You guys can, some can ask some questions in the chat and qand a, and we'll see if James is able to, to reconnectto the, to the Zoom. Uh, if, oh, oh, oh, he's gone. Okay. Uh, so we'll see if Jay's able to reconnectto the Zoom in the next couple minutes.

Um, if not, uh, we can, you know, use some more timefor just q and aand then we will, uh, we'll just cut it out. So cut it off. So, uh,to confirm the Docker needs standalone, SED and min io,and what is the message to only if integrating with Kafka?Uh, so yes, so Viss, um, when you, uh, deploy viss,you'll have the standalone viss, uh, SC, D, and MIN io. These three, um, these three, uh,containers will need to be up in orderto store the information. So, SCD is what we use to store, let's say the stateand min io is what we use for permanent storage.

Um, and oh, James is back. Okay, great. Uh, so James, um, I think, um, we got disconnectedduring when you were talking about, uh,what happens when delete happens. Uh, so if you wanna pick back up onthat, that would be great. Yeah.

Uh, sorry about my network fluctuation,but, uh, yeah, so when delete happens, uh,since we are actually, depends on object storage,is not going to be able to mutate any of the files. So we just, uh, pin the, uh, a data, uh,new data format called data log. So in the data log, we just maintain whichentity has already been deleted. So when read happens, we just, uh, it, it works as a musk. So all the data is, when it happens, it is actually not per,but there'll still, still keep, uh,in this, uh, object storage.

But, uh, there's no way to search it out. And we also have a background task, which call compaction. So what compassion do is going to be merge all the, the logsand, uh, the individual logs together. So make sure the data itself is clean. So that is what it happens.

So, uh, when you, when you append this log to the log, you,you also have to notify all the criminals, right?You have a new delay, please make sure that you must, uh,the, uh, uh, the, uh, the dataand the user will not going to be read out. That is when delete happens. So, um, and then what is observed?So what hap uh, because, uh, mul has a primary key. So we, we want to keep, uh, primary key as unique,but for some technical reasons, uh,there's no way we can guarantee that, uh, when youerto duplicate primary case, uh, which key will going to be,uh, the, uh, the one you actually searched?Yeah, it's, it is not like a traditional database,because for traditional database, the merge happens. Um, uh, the, uh, the search itself is, is, uh,uh, actually, uh, accurate.

So, so if there are two entities with, uh, same primary key,you can always do override. But if, when you do a search, what could happen is that, uh,old entities is, is, is not in the, uh,it is in the top key list, but they,but the new internet is not in the top K list. So what could happen is, uh, uh, you, even if you wantto do override, the old internet still shows upand under some other queries you see the new, uh, entities. So, uh, to avoid this, uh,one thing you could do is to do a observe. So basically what, what what we do is just, uh,have one delay down the old into this.

I mean, there's a new one into the new growing segment, sothat help you to avoid the issue. But, but, but to be, to be, um, like,to be cautious about, uh, using this, uh, absurd as wellas delete, if, if you're trying to delete bunch of data,it gives you a lot of, uh,pressure on both compaction and in building. So any, if you have very frequent delete, uh,I would recommend you to use later version, um,maybe 2, 2, 3, uh, latest version of two three,because we fix a lot of issueon concurrent delete and insert. Uh, as far as I know, there are some bugsthat might lose some data on, uh, when did they happen,lose some delete, uh, for, for, for the two, two versions. So we have very frequent,but that always happens, um, if you have a very large, uh,delete, um, or, uh, very large number of delete or insert.

So be, be cautious if you have, uh, delete. Yeah. Okay, cool. So, um,I'm gonna just summarize all of that back up. So normally the indexes, if you build an index,an index is immutable.

But because we have this concept of these growing indexesand, um, you know, these, these sealed indexes,what we can do is we can actually, uh, build indexes that,um, uh, uh, are, are, are more easily scalable,are more easily searchable. And this also actually is helpful for when we delete data,because delete, deleting data typically acts somewhat likea, uh, let's say like a bit flip. And what we do is we mask that,and, uh, if we didn't have these, these growing indexes,there would be this, this challengeof getting the correct primary key. But because we have these, uh, segments,these growing segments, what, what we can do is we can gothrough all of the old segments, get rid of the oldprimary key, and then add the new one into the growingsegment, so it's in a different index. So that's a really, uh, effectiveand interesting way of doing that.

And we have another question here in the chat from Frank. Um, not Frank, our head of A-I-M-L-A different Frank, uh,what is the technical restrictionfor only one DB per connection?Um,Uh, I, I don't have like, any idea about what,what is the restriction for one DB per connection?Yeah, Frank, uh, can you clarify that a little bit?Yeah. Maybe meantime, we can, we can just go ahead. Yes. In the meantime, we will go aheadand, uh, do the, the last bit of this.

So the last, oh no, there's two, actually two more pointsthat we, uh, we, well,we can see if we can cover both points. I think configuration is probably the more important points,and then documentation is much easier to cover. So why don't we try to take some timeto talk about the configurations that are available for VISand, um, how toconfigure Viss on the fly. So, uh, I, I, I can just, uh, first, uh,answer the questions from Frank, right?So a is that he can't, it just cannot switch to another db. Uh, I, I, I think you can just do it by using,using another, uh, db.

So it should, but, but yes, you're right. For each of the connection, it can be using only one db. So if you have multiple, uh, databases, uh,the recommended way we do is to have multiple process. If you have Python on users to have, uh, model process,you have, uh, uh, if you are using like other language, likefor example, Golan or Java, then you, you could just, uh,uh, create two mill kind in using different dbs. Yeah, so, uh, getting back to configurations,I, I think the most important, most, most of the times,like, I mean the default con configurations,you just is strong enough for most of the users,unless you have very large deploymentor if you, uh, uh,performance is gonna be very critical for, for use cases.

So, uh, some of the, uh,configurations you, you, you might want to change. First of all, it's the segment size. So you, uh, by default is only, uh, five 12 megabytes. Uh, it is, it is a design for very small deployment. It have a very limited memory, like four gigabytesof eight gigabytes.

Then five 12 will be a reasonable initial well,but you have large clusters, you have large amount of dataas you want to improve the performance. I think, I think one of the important configurations needto change it segment size. Uh, I would recommend to use maybe two giga,two gigabytes from the, uh, if,if you want to improve the performance. Yeah. And, uh, also the, uh, if you are lookingfor like performance change, uh, there are a lotof search parameters as well as index field parameters.

You, you want to turn, especially for search parameters. Uh, it, it gives you a very flexible trade off betweensearch accuracy as well as the performance. So, uh, yeah, a actually some best, best practice,but for different index, it's gonna be different. Uh, we actually have a lot of, uh, articles talk about, onboth those blogsand mill blog, so feel free to look at those articles. Yeah.

Okay. So for configuration, the segment sizeand, um, the segment size is probably the most importantthing to change to get, uh, performance. And then the different best practices are just gonnabe different for the different indexes. And, um, that's probably an entire talk in and of itself. Um, so, okay.

And then the last of the 10 most commonterms that we saw on the VUS website,and the most common questions is documentation. What should we do to go make sure that we're upto date with documentation?Where can we find the most up-to-date documentation?Yeah, so, uh, yeah, we, uh,we know like all the users to just, uh, be, uh, uh,they all want have better documentations. We all want to have better documentation. So, yeah. But, uh, it's, it is definitely need, uh, lot of, uh, like,like human powers as a, as well as a lot of patients.

So, so, uh, first of all, like any,any contra contributions will be like, welcome for,for the meals, uh, website, uh, maybe not just, uh, like,help us to write contents. Just give us some adviceor some issues about what kind of, uh, contents you,you are looking for is gonna be very, very helpful. Yeah. So,and, uh, secondly, what the one, uh, one thing we're tryingto do is we, we, we are trying to make sure,because we, we have a bunch of new features, like for 2. 4,actually, they are maybe two more than two major features,um, want to release.

And we want, we want to make sure that each of themmajor features will have one, um, uh,user document as well as one detail document to,to introduce their in implementations. So, very interesting about all those kind of new features,um, that'd be good enough. Help you to understand how to use it as, as well as, uh,a little bit about the, uh, design details. Yeah. And third part, so Rack is gonna be,uh, super, super popular.

So, and we are actually, uh, very familiar with how, howto build a rec system. So one of the things we, we might, uh, do in the,in the near future is we want to build a, uh, bot for,for mul and those documents. So it, it has all the information which just throw in allthe document into, into, uh, uh,from mul websites into this, uh, this bot. We do extra search. We also use, uh, larger models to, to,to do a generation part.

So it could, it could help you to, uh, find more accurate,uh, document result. Uh, for now, our website, uh, just use the goal. It is actually a good agreed tool,but it is more like focused on keyword search,but uh, we want some semantics. Yeah. Okay.

Awesome. Um, thank you, James. The, the, so, you know, just recap guys,the documentation stuff. Please give us your input, tell us what you needto know, tell us what you wanna know. We're happy to to, to help you out there.

Um, and yeah, I think that iswhere we can pretty much wrap the session here. We can wait for one more minuteto see if there's any incoming questions. I think we have time to take maybe oneor two questions if there are. Um, if not, I think it's totally fine to just, uh,wrap the session here. Um, okay.

So we, we have one incoming question. Philip would like to know if, uh, he can get, uh, moreinfo regarding the RAG implementation for our vis,uh, document bot. Yeah, so, uh, they, they'reactually like non matric. So one, uh,the framework we we're actually using is actually LA Index. So yeah, offers a lot of flexibility.

If you just want to do rag you,you can pick your own embeddings,you can use your own rankings. So one thing we, uh, we wantto try is actually have research. So I put all the documents, uh, for both,uh, process meetings. So, so, so far, uh, open-end embeds is, is actually great. So, uh, we also, uh, tried different kind of, uh,open source in embeds.

Uh, BG in embeds is actually, uh, one our choiceand that we probably, probably, uh, so, so, uh,we're current, uh, currently testing some, uh,other newer embeddings, like Mistral embeddings,it is actually built on large models,so it should have better qualities,but that will still under some benchmark. So on the, on the Sparsing mining side, uh,what we're actually using is spread is, is actually, um,I think, I think the, uh, uh, sparsing mining model overall,so search on both sides using, uh, BD rankers to,to do the rankings is, uh, totally open source by v do, uh,uh, use, uh, like some, uh, paid service. Then cohering, that coherent ranking is exactly oneof the solution for Google. They also have their own rankings. So, uh, that, that is for the third partfor generation definitely opening I, but if you have qualityor if other, other kind of, uh, learning models,that should be good enough.

Yeah. Cool. Awesome. Follow, Follow la Index,follow our blogs. We'll have a lot of information about React.

Yes. I've also done a lot of work on rac, so, um,cool guys, so thanks for, thanks for being here. I think this wraps our session. All right, thank you James. And thank you everyone for being here,and we'll see you guys next time.

See you guys.

Meet the Speaker

Join the session for live Q&A with the speaker

James Luan
VP of Engineering at Zilliz
James Luan is the VP of Engineering at Zilliz. With a master's degree in computer engineering from Cornell University, he has extensive experience as a Database Engineer at Oracle, Hedvig, and Alibaba Cloud. James played a crucial role in developing HBase, Alibaba Cloud's open-source database, and Lindorm, a self-developed NoSQL database. He is also a respected member of the Technical Advisory Committee of LF AI & Data Foundation, contributing his expertise to shaping the future of AI and data technologies.

Optimizing Your Milvus Instance

About the Session

What you'll learn:

Meet the Speaker

AI Assistant