You’re in!
Live
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Resources
0:01 thank you very much for that introduction Chris uh welcome to everyone who has joined the webinar or 0:06 watching us on a recording uh thank you for joining uh we're going to cover a few of the topics like we have been in 0:13 prior months uh giving you a little bit of an overview of our uh Zilla's cloud offering uh and I'd like to go through 0:20 an illustration or a demo and how you can get more help these the series of 0:27 cloud uh monthly technical review uh recordings and webinars are intended for 0:32 a technical to semi-technical to highly technical audience but covering different topics that are of interest to 0:39 developers uh administrators is where we are trying to keep our focus so uh with 0:45 that uh my expectation our expectation for uh those who are joining is that you 0:51 are somewhat aware of vector databases geni use cases vector embeddings and how 0:58 to use you know other database uh vector database products uh there are many in the market uh we are the uh uh creators 1:06 of milvas the most popular open-source database and when we offer that as 1:12 software as a service on the cloud uh we're a company named Zillas so what is 1:18 Zillas cloud um we are uh the the software as a service version of Milvvis 1:25 the most widely adopted vector database um if you are 1:30 a startup or a well-established company or just coming into the into the foray 1:36 of Gen AI use cases you're going to realize pretty soon that a vector database is your friend um and based off 1:43 of that when you can index uh vectors uh that are generated from uh encoding 1:50 models you can then do semantic and similarity search so those are the fundamental ideas behind all of this and 1:57 what we offer is mil zillus cloud and zillus cloud bring your own cloud uh 2:03 that's a service where you wish to host your own hardware on the public cloud 2:09 and we run a few agents and collect telemetry and give you controls on our 2:14 control plane for you to administer the um Zilla's product 2:22 just a little bit of a basic background um vector search is most uh applicable 2:30 where you have content that is unstructured where things are scraped 2:35 from a PDF or a web page or XML files that are converted to PDFs in it you 2:42 have things like uh titles pictures timestamps and they're all meshed in 2:48 together and a lot of the content is just text it's diagrams there's no real 2:54 structure there's no dictionary so to speak and using different uh encoding 3:00 models you can take all this content and you can break it up make it a little bit 3:07 more structured however wherever you have content like a Wikipedia page you 3:13 can index that as a as a vector field and then allow your users to go search 3:19 on it one of the very well-known uh use cases for this is rag retrieval 3:25 augmented generation and we power that using our MILV kernel uh on the Zillas 3:31 cloud so what separates Zillas from Milvvis is 3:38 that we have something known as the Cardinal search engine which is constantly under development it's 10 3:43 times faster than MIV it has better indexing better quantization and uh it has something also known as 3:50 autoindexing uh which sort of obviates the need for specifically selecting an 3:57 inverted file index and having to select your quantization it does a lot of that behind the scenes for you we're a 4:03 cloudnative database so all the scalability all the management is handled for you you don't have to worry 4:10 about running a Kubernetes pods um or running an uh elastic cube cluster or 4:16 any of that we handle all of that on the cloud and it's enterprise ready in the sense that we will give you role-based 4:22 access control private link security we're socku compliant so and there's a 4:28 company Zillas uh backing you for the um you know all the SLAs for support and 4:35 making sure that any tickets are resolved on time i don't want to go too deep into the 4:40 Cardinal search engine we have other items on the agenda but it's important because a lot of our customers come to 4:47 us talk to us because they're really impressed with MILV they want to run mil on many machines they want replicas they 4:54 want a lot of scalability but when they come to the Zilla's cloud uh they get the same API functionality they get to 5:01 use the same APIs uh for REST uh but they also get this autoindexing 5:07 algorithm we have better quantization uh it's in the cardinal search engine it's 5:12 faster we outperform Milvvis by 50% capacity increase um and we get up to 10 5:18 times uh performance boost so moving off of MILV coming to the cloud version of 5:24 Milvas which is Zillas uh actually has a benefit not just in operational 5:29 simplicity but also having better cost uh profile uh for your admin team 5:37 we're enterprise ready we have security we are on three public clouds uh very 5:42 resilient so we are able to do replication within a region um we allow 5:47 you to have bring your own cloud so everything that large enterprises and 5:53 startups that have seed funding or first few rounds of funding they can they can 5:59 alleviate any fears uh they might have about running something all on their own we have physically isolated and 6:05 dedicated clusters uh no direct VPC access IP allow lists and even SSO 6:12 support we plug with uh plug in very nicely and play with Microsoft enter ID 6:18 um technical support is included in a subscription uh you can have multi-year commits that's in the realm of sales uh 6:26 fully managed service benefits and SLA commitments and guarantees now we get to 6:31 the fun topic and fun part of this uh June uh monthly webinar uh there are a 6:38 couple of things handful of things that I have decided would be very good for me to cover uh since in the last 24 hours I 6:46 have some test harness code and I want to talk about our scalable architecture 6:53 i have a cluster that I created and it's been running for about uh 24 hours and 6:59 if we look at the last hour I have been throwing a lot of data at it uh my test 7:06 harness which I have set up in my development environment 7:13 as you can see here this is Visual Studio Code and I had this code 7:19 written so we can write a 100,000 vectors at a time 5,000 batches with a 7:25 certain number of workers and we connect to uh my running instance and I will get I will cover all that um I have 7:32 saturated this what I want to talk about is the fact that you can have a cluster and you can control how you uh you know 7:41 the capacity beyond which it's going to scale all on its own and I want you to notice how linear this is we don't ask 7:48 you to double the capacity now a little bit of background around this a CU is a 7:54 compute unit and there are different types of compute units we have a performance optimized capacity optimized 8:00 and extended capacity and the this is the order in which you allocate the compute units you get the most bang for 8:07 the buck the fastest performance with performance optimized uh we price our vector uh in the size of 768 and I'm 8:16 using 372 dimension vectors so I can saturate this cluster faster now note 8:21 that you don't have to pay for double the capacity you can go up two CUS at a time in other words these are compute 8:27 units and and you you pay a little bit more for for as you go up uh by the hour these are dedicated clusters as opposed 8:34 to using serverless uh where you don't get the same level of performance 8:39 consistently throughout the operation of your cluster um right now I have a 8:44 cluster which I started off with one CU and I saturated it enough uh such that I 8:50 was alerted by the system you can see this thread and I kept writing data to 8:55 it and it automatically moved up to 2CU um now I continued to write the data and 9:02 in the last hour if we look at the metrics um this is the last 10-minute chart because I 9:09 haven't really been doing anything in the last hour I've really saturated this but you can see that it goes up and down 9:14 up and down that's because various services that are ingesting the data are also then running compaction building 9:21 the index so it requires that those services be activated as we ingest as we 9:26 build the index and then as we release memory uh we are able to alleviate the pressure off of the cluster what I'm 9:32 going to do is have my test harness run the next batch of inserting more data 9:37 and I'm now going to allow for this cluster to have a relief valve i'm going to say you know what you can scale up to 9:44 four CUS once you hit this capacity threshold which is a very simple calculation around memory so we're going 9:51 to come back and revisit this but this is our very powerful automatically scaling u cluster ability and this just 9:59 tells you what's going on and um uh we we set the autoscale over here there is 10:05 slight jitter when this cluster does upscale and you are notified by email 10:12 uh when this when this happens so I did receive a couple of alerts we're going to revisit this for sure um 10:20 we're always also moving on we improve uh how we have our our our UI i'm really 10:28 proud of the fact that our engineering team um just going to close this tab here 10:34 and I'm going to cover intuitive collections UI you see unlike other 10:40 databases uh and some might have this feature when you create a collection 10:45 when you operate this database this entire user interface acts almost 10:50 like an integrated development environment you don't have to run everything with pias of course while we 10:57 do have an API playground which lets you administer this database um you can create new partitions inside collections 11:04 you can create new indexes uh drop collections describe collections This is available but this still has a 11:11 programmatic component to it if you are getting started with vector databases and you want to operate things in a in a 11:18 very free flowing way you just want to explore things this is one of the best UIs around 11:25 now you can see that I created this uh vectors 372 this is the collection that I'm using uh 11:32 for my cluster to to saturate it and uh right now I am at 86% capacity but I'm 11:39 allowing it to go up to four we're going to revisit this if we take a look at this create collection capability we've 11:45 made some improvements over time but at this point where we stand you can actually create a brand new collection 11:52 inside of a database and if I create a test 02 collection I can give it a test 11:59 data for inference or test data for new model selection 12:06 and here uh right here we give you a lot of flexibility we also have a very 12:12 friendly UI very intuitive UI where you can hover over fields to find out uh the 12:17 kind of capabilities that we provide you can also have your primary tree automatically generated and we require 12:24 at least one vector field and you can select the dimension i'm going to put something that's sort of middle of the 12:30 middle of the pack 1536 there are uh models that work with smaller dimensions 12:36 also and recently what we have done is given you the ability to select full 12:41 text search why can't I click on full text search that's because I don't have 12:46 um a varchar field in other words the collection requires that if I have a 12:52 description of let's say a product catalog and every product has a description I can give it a varchar 12:58 field and as you can tell with every field that you add um what you can also add is a 13:05 corresponding uh attribute for which you can provide a value so in the case of a float vector you have dimension in the 13:11 case of varchars you have max length i'm going to say that well perhaps uh 4,000 characters is is is a good description 13:18 now let's go forward and uh provide a description desk vector 13:26 and without going into too much detail I'm going to talk about you know uh sparse float vectors when you have text 13:32 when you do want to do semantic search you use dense vectors when you want to do a a more of a lexical analysis like 13:39 in the lucine library we use sparse vectors because the the uh we have that 13:44 many more uh indexes in the array that we want to have positionally what is 13:49 active because we cannot have you know a zero for for all of the positions for 13:55 which we don't get a term um as of late what's what's become very popular is the 14:01 BM25 function and that's what we support a function is used in full text search to convert tokenized items to sparse 14:07 vectors with relevance scores if you remember how the page rank algorithm was developed it was about relevance scores 14:12 for a page how many other pages led into it based off of search terms bm25 is 14:18 another score almost like TF term frequency inverse document frequency and 14:23 I'm not going to go too deep into this uh rather involved theoretical topic i would encourage uh anyone watching this 14:30 or live here to please uh take a look at that but we can now add full text search 14:36 based off of the description this is the score and what sparse vector field and 14:41 it's done immediately you don't have to do anything programmatically you can start using full text search after this 14:47 uh with the examples that we provide in our uh on our website so I can save it 14:53 and this is a very powerful feature i feel you can do a lot more uh with Zillas and I can now create this 14:59 collection um I'm not insert interested in inserting data 15:05 so that was a little uh a little uh demonstration I wanted to do what I 15:12 sorry about that what I also want to highlight as we were looking at my test 15:17 harness which is now done inserting more data into this is if you look at this URI my endpoint on the vector database 15:25 and I've got this token which I'm going to invalidate as soon as this call is over um 15:31 we get this information from here uh this is not collection specific or database specific this is uh the cluster 15:39 URI and the token then uh empowers you to connect to the cluster and create collections create indexes insert data 15:47 if you notice this this is a secure endpoint um our driver the PIM Milvas 15:54 library uh handles this for you if you've also been uh testing a lot of the 15:59 code uh outside of our cloud environment that is maybe in Milvas you just have to 16:04 provide a new URI and new token and everything is going to work just as it 16:09 was before all the insertion all the search all the hybrid search 16:17 excuse me so this uh leads us into talking about some security features 16:23 that we provide for you on the cloud i'm not going to go into the mathematical or theoretical explanation of how 16:28 everything is built but the important thing to note is we support private endpoint and you can set this up as long 16:35 um as you are uh able to do so uh I chose uh Google cloud to run this 16:41 cluster but we also support this on AWS um and Azure the IP access list is 16:48 something that is very very powerful in when you have a VPC uh sorry VPN 16:54 connection uh or you know the specific set of IPs in your organization so we have allow lists uh with which you can 17:00 do this of course I want to be able to reach this cluster from anywhere but you can add uh a cider notation IP address 17:07 and we will uh save that for you and that becomes uh you know a sort of a a 17:13 gateway to getting into this uh there are other uh safety mechanisms uh the 17:19 other one is we have TLS by default which I talked about there's IP allow list private link we allow and arbback 17:24 with privileges arbback stands for uh rolebased access control it's a very 17:29 standard concept uh almost all databases have it and u first you have to of 17:35 course understand the privilege model and this is these are our privileges it's been very well thought out uh I'm 17:42 very impressed and proud of our engineering team for how they've they've gone about um the kind of privileges you 17:48 can have tied to a specific role um within the system so if you have a 17:53 readonly privilege what are you allowed to do well you can see this checkbox you cannot create a partition you cannot 18:00 drop partitions so obviously you can't change anything of course when you're the admin you can do pretty much 18:05 anything uh we have database level permissions and we have cluster level permissions so as you can see with 18:11 readonly you can do only so much with read write you can maybe do a little bit more and with cluster admin you can do a 18:17 lot more um so as far as if you are a large enterprise or small enterprise you 18:23 have PII data you don't want people to be able to uh you know uh get in and and 18:30 make changes to the data you can uh use the privilege model and let me just 18:35 cover something very briefly users and roles are at not necessarily any one level um sorry 18:44 uh the roles I can create a new cluster role and assign privileges you know very 18:49 fine uh grain permissions especially on on this so so collections uh you can 18:55 select well first you select the database and you can select specifically on which collection you want to provide 19:03 what what kind of a uh what kind of privileges so we have a very rich UI now 19:09 there are database systems in which you know you have to do this using a SQL like language um you know describe roles 19:16 assign XYZ type of role with specific privileges to a uh uh to a securable 19:21 type object we allow you to do this of course very much using our API playground but also from here 19:29 all right moving on uh I just want to talk a little bit about what recently we 19:35 have added to the to the cloud side of things and a real brief note and I will belabor this point um whenever we 19:42 release a new version of MIV we allow it to battle harden we make sure that our 19:48 user community can download it test it provide their feedback and only then do we bring uh those features and those uh 19:55 improvements into the cloud we don't want our existing customer base and new 20:00 customer base to be surprised by any kind of issues tickets that that go unresolved um so while I will cover some 20:07 of this we will not be hosting uh Milvas 2.6 on the cloud uh because you can see that if I was to create uh a brand new 20:15 cluster um a dedicated cluster um you know it's it's actually going to have um 20:22 version 2.5 which is our latest release i did that yesterday for this cluster 20:27 and uh it's Milvas 2.5.x we give you the latest software uh we don't force you to 20:32 upgrade um but we don't uh prematurely release anything onto our cloud offering 20:40 so now that we've been talking about uh scaling let's just let's just go back very quickly and see where we are and as 20:47 you can see in the last hour uh we we were at very high capacity 20:53 right we were up to 86% but that suddenly changed why are we down to 36 21:00 that's because the cluster automatically upscaled to four compute units and let 21:05 me see if I received I did receive a notification about it 21:11 okay this is the prior one in the morning and this is just now while I was speaking uh with this audience 21:17 so this is our very highly scalable automatically we manage things for you 21:22 type of architecture of course it is going to come with a higher cost but my hope is that if you're ingesting more 21:28 and more data uh you've got a a growing business and we congratulate our community for that all right so let's 21:35 talk a little bit about recently added features i've picked these up from our uh documentation around release notes uh 21:44 it's simply you know every quarter um we improve 21:49 uh the product we we have some more release notes you can go find them yourself from a a simple search engine 21:55 uh search and I picked these up some of these from here i think they're they're worth a mention uh migrations are very 22:01 important uh to our users this is private preview but this is something very important to our user community um 22:09 I'm not going to go too deep into this uh I can cover other topics also but 22:14 what we allow uh is uh that without having to bring down your cluster or 22:20 stop uh taking rights you can do uh direct data transfer there are some limits here uh what are the 22:27 prerequisites what is the mechanism it's all documented um and if you want to be 22:33 able to do it we have uh very simple you know instructions and some illustrations 22:40 on what you have to do step by step i've been really impressed by this and you know our users are asking more and more 22:46 about it and then starting to use these new features how you can monitor what's happening 22:53 and by the way um this is you know an asynchronous process so while it's going on you can actually watch it which is 22:58 very similar to how we allow you to do bulk import for very large terabytes and pabytes of data you can kick off a 23:05 process and then take a look at what's going on and we do you know um have various 23:12 stages there's you can monitor sync lag you can of course stop the data sync and then the phase three is uh you you 23:19 switch to your new collection uh to a new cluster 23:25 another feature that's been around for maybe a couple of months is uh this alerting now the alerting part that's 23:32 improved as it's it's very much policy based and what I can show you here is that we have project alerts so if you 23:41 look here we've got a bunch of projects and each project allows you to have multiple clusters so if you're testing 23:46 if you're if you're just developing if you have uh user acceptance testing if you have data that you know is has PII 23:53 so you're only playing with uh very um you know insecure data or data you source and you can have a separate 24:00 project for that different users can be invited to those projects an alert allows you to say well I have an alert 24:07 about uh you know something concerning and the metric that you choose it's 24:13 quite flexible is uh bulk write QPS uh cluster write performance you have to 24:19 have an enterprise subscription to be able to use this and you say well once this reaches over this threshold or it's 24:25 below something and the duration is maybe a minute um then based off it 24:30 picks up all the clusters these are all the clusters in our in our project but you can take out clusters and it will 24:37 email uh people that are within my organization or by by um by a role uh 24:44 this is not a group this is a role and there are so many different touch points by which you can actually hit someone 24:49 and say "Hey I have something to tell you about." Um this is a very awesome feature that I'm hoping more and more of 24:55 our customers will adopt and be able to self-manage a lot of the uh uh at least 25:00 uh you know be able to capture uh what they want to in terms of alerts and management 25:07 and the next one which has been around but now we only we support uh BYOC this 25:13 is going to be a huge driver for our business it's going to be huge driver for our uh community it's going to empower a lot of our customers to be 25:20 able to have the kind of control they want over the hardware when they utilize 25:26 the software as a service part um of our product and host their own machines be 25:32 able to select EBS profile i mean EBS is elastic block uh you know uh hard drives 25:38 that you attach but it can happen in any cloud uh we support deploy this by on 25:44 GCP we give you uh the Terraform scripts and uh all the prerequisites that are 25:51 required for you to have virtual machines that are provisioned by yourself and how you can connect and 25:58 have a control plane such that the machines and all the resources can 26:03 send telemetry and alerts and everything uh to the Zilla's cloud plane by the way 26:08 when I say that word cloud plane I'm talking about about this part this is how you control 26:16 um the Zilla's software executing clusters the services um and with BYOC 26:23 you know right here this cluster is not BYOC so I cannot I'm not in control of the hardware all I can ask for it uh is 26:30 to is to scale up so we enable this and because this is in this entire section 26:36 here we have deploy BYOC on AWS this covers a lot of ground as time goes on 26:43 we will be making this self-directed uh for our customers to do this on even 26:48 Azure down the line and AWS so there was a time when we had to work very closely 26:54 with our customers and their administration team and their development uh team uh to do a lot of 26:59 the work for them this does require higher uh upfront commitment in terms of the investment that our customers make 27:06 with us uh but once that partnership is established uh you know we have provided 27:12 you with a lot of resources on how to get started and and be in control it's it shows the level of maturity that 27:20 Zillas has uh uh you know reached in enabling uh vector databases as part of 27:28 your integrated genai solution and many of our competitors are actually behind us in this capability 27:34 um so one last point I wanted to make which was about Milbus 2.6 is we only 27:40 have a release candidate one we're not going to make anyone a guinea pig we're going to have our user community and 27:47 internal tests run through this but what's coming is absolutely state-of-the-art we're going to use 27:53 Woodpecker for uh the write ahead log that's what W means uh there's going to 27:59 be much better ingestion and streaming we're going to merge some of our services we're not going to run as many 28:05 different uh pods in the Kubernetes world it's going to become one coordinator and one of the most 28:12 important things we're now releasing a rabbit quantization a one bit quantization and of course our customers 28:19 on Zillas Cloud could choose this but they don't have to it's going to be a part of the autoindexing um we are 28:25 releasing this uh pretty much this documentation is like a blog post very theoretical and how this exactly works 28:32 is something you can find out about of course with YouTube videos um but as far 28:37 as uh enabling this is concerned this will be ready for you to to to enjoy and 28:42 get uh cost benefits performance benefits out of uh this will change we're going to be adding more uh as 28:49 version 2.6.1 and 2.6.2 come out we were going to battle harden things uh there's going to 28:55 be phrase matching um but of course we can always do another webinar on all of that right about now I'm at time i thank 29:03 you for joining and letting me walk you through some of the arbback security uh 29:08 scalable architecture and other features thank you very much that was really great Roit and so where can they um 29:16 reach out to you like where do you like kind of hang out i hang out at Zillas my 29:21 email address with my first name as displayed.ast name atzillas.com 29:26 uh we have uh quite a quite a how can I say very competent and smart team we 29:32 have a sales team you can send us your questions don't be shy about anything there are no no stupid questions with us 29:38 uh we can educate you about vectors vector databases mil running mil uh by 29:44 yourself on your laptop um how to set up IDEs i can help you with all of those uh 29:49 different topics so you can uh you can find roit in our discord channel you can also just reach us on our website we 29:55 have a contact sales form we also have uh free office hours that you can set up sometime so lots of ways to be able to 30:03 reach out to uh RoIit and he can uh further walk you through uh all these capabilities when you're ready to get 30:09 started with Zillas so um we will uh pres we will make sure that we um 30:15 provide all these materials to everybody at the end of the session but before we sign off RoIP what is like some sage 30:23 advice you have to give to people when they get started you cover so many things so maybe like one or two points 30:28 that you'd really want to leave our audience with if you have a bonafide use case and you 30:36 are a business you're a startup uh you ask me what can they do to get started I would say just get started uh don't be 30:42 don't be uh shy about reaching out to us if that's what you need start with Milvvis if that's what you need of 30:48 course I think that Zillas is a much superior enterprise product for you to actually run your business um and any 30:56 resource we have on our website is for you to consume uh start there uh find 31:01 the blog posts find YouTube videos uh talk to us and land some data find the 31:07 data that you want to work with and be ready uh to to upload it um you know we 31:12 have migrations if you are already running one of with one of our competitors if you're running with Milvvis you can extract from MilV you 31:20 can download backup files by using our MilVIS backup tool and just simply upload it we'll create a collection for 31:25 you and guess what one of the most amazing things about MILV and Zillas is we didn't create a new library for you 31:32 to use i apologize we you use the same library the only thing you have to do is 31:37 change the endpoint URL provide us with a token provide us with a token and off you go all of your code is going to work 31:44 exactly the same that's right you always want to just be able to write once right and then not have to rewrite anything 31:51 right and if you use uh you know uh Python Golang it's once you write the 31:56 code it's going to work against Milvas it's going to work the same in in Milvas uh sorry Zillas on the cloud and BYOC so 32:04 get started get some data up and up and running um and if you are just learning 32:09 we also welcome you into the the Genai community and the vector database community there are so many different 32:14 resources of course hugging face is the is the best known one from where you can get models and data sets but of course 32:21 there's also data.gov there are so many data sets that that are available here for you to just you 32:28 know um satiate your curiosity uh download something convert that to 32:33 vectors using uh uh uh uh embedding models and get started with a vector database and you will actually uh wow 32:39 yourself with what's possible that's amazing uh well uh thank you so much 32:44 once again for joining us and I suggest that you reach out to Rohead and then 32:50 don't forget we do have a free trial with Zillow so you don't have to pay for it and it's actually pretty powerful we 32:56 actually provide I think up to two collections of uh a million or half a 33:01 half million in each collection of um of vector embedding so it's um there's a 33:06 lot you can do there up to five collections yeah so there it's plenty for you to be able to to get started 33:12 with and then as uh RoIP mentioned you know with any of our products write once and it's easy to bring that over to our 33:20 dedicated cluster when you're ready that's right thank you so much everybody 33:25 and we can't wait to hear what you build and we look forward to seeing you again soon bye-bye 33:32 thank you