Events
A Beginners Guide to Building a RAG App Using Open Source Milvus

Training

A Beginners Guide to Building a RAG App Using Open Source Milvus

Zilliz Webinar | Zoom

Join the Webinar

What will you learn?

Join us for a technical webinar where we will showcase how you can build a RAG using Milvus. Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

This session is designed for developers eager to learn more about RAG and Vector Databases. Through live demos, we will show you how you can effectively build a RAG application to be able to incorporate external knowledge from pre-existing sources.

Topics Covered

Introduction to RAG: What is RAG? When do you need it and how to build it?
What is the RAG tech stack: RAG uses different components, we will go through them.
Building a RAG app live: You will see how one can build a RAG app using Milvus.

This interactive session will equip you with the knowledge to efficiently build and use RAG using unstructured data.

View presentation slides

Transcript

Today I am pleased to introduce the session,a Beginner's Guide to Building a Rag Appwith using open source melvicand today's guest speaker Stephen Batifol. Stephen is a developer advocate here at Zilliz. Um, it's nice to workwith a colleague, uh, on these webinars. So he previously workedas a machine learning engineer at Vol, uh,where he was working on the ML platform. And as a data scientist at Bravo,Stefan studied computer science and artificial intelligenceand he enjoys dancing and surfing in his job time.

Welcome Stephen. Thank you very much for the intro. Alright, so yes, today I am here to talk to you,uh, about the beginning guide on building a RAGapplication using vis. So as said before, I'm staff, butbefore I'm liberal at zills. Uh, you can find all my socials here if you want.

If you have any questions related to vis related to rag,feel free to reach out to me directly on LinkedIn,directly on my email or on Twitter as well. So yeah, let's go, let's get started. So rag, uh, I guess if you're here, it's alsobecause you wanna know what it is. So RAG means retrieval augmented generation,and that's what we're gonna talk about today. So the basic idea of RAG is that, you know, you wantto force your LLM to work with your dataand you usually inject it if you evicted database like bulet's say you have your LLM and if you used open AIand you used GPT for example, you may have seen that,you know, there's always a cutoff date.

Uh, and then after that your LLM doesn't know aboutwhat what has happened. And also if you use private data, uh, you know,U LM doesn't have access to your mailsand different things like that. So for that, usually you, you use ragand how it works is that the active database,they provide the ability to inject your data directlyand if you have semantic similar similarity, uh,and that's how it works and that's, uh,why you always have Vector database using for rag. So then you have different thingsto take into, into consideration. If you go, if you use RAGand vector database, um, the scale performanceand the flexibility, obviously you don't want to usea vector database if you're gonna have, um, you don't wantto use something that is in scale if you have millionof vectors and different things like that.

So yeah, take that into consideration your site. And problem is that LLMs are stochastic,so they always predict the future token. Um, and they're like, you know, that's cool,but that's also why, you know,you sometimes you have different answers, um,with the same question. So that's a problem you may have, may haveand at downside as well as like, you know, I mentioned itbefore, you might have updated input data, so thatcan cause some hallucinations. Um, and for the people who don't knowwhat hallucinations are, something that is,that sounds like it's true, but it's factually incorrect.

So you can ask an L about, I don't know,where someone is born and they're gonna be very confidentthat this person was born in New York even though thisperson was born, uh, in Chicago for example. So that's a problem. And this is a basic drug architecture I would say. So you always have your data, uh, then that's how you start. Then you're gonna extract the content from your data.

So if you have some PDFsand then you're gonna try to extract the content. Same if you have some videos and if you have differentthings, then you're gonna chunk your data. You're gonna put that into an embedding modeland I will explain those later, um, on different slides. So no worries if you don't know what it is,then you store everything in vis. Then once you make your query, uh, then you also haveto make your query go through an embedding model.

And then we run a se cementing search. And then your LLM will then know, uh,about like similar dataand then we put everything in the contextand then you give results. So that's a basic rug architecture. Um, and yeah,chunking embeddings you'll see are very, very important. So the tech stack, uh, for this one, um,it's lang chain.

Then I'm using embedding models directly from hugging face,uh, from sentence reformer with former, I'm using the VISfor the Vector databaseand I'm gonna run everything locally. Um, so I'm gonna run LAMA three directly on my laptop. And also, yeah, for this RAG app,something I didn't mention isthat everything will run locally, everything will be free. Uh, there will be no open AI or any other models. So lung chain, what it is, it's a frameworkto build LLM applications.

And it's mostly focused, I would say, on retrieving dataand integrating with lms. So it's very useful if you have to load some data. Let's say, you know, you wanna load some PDFs, um,your PDFs might be local, they might be on your computer,they might be, I don't know, on the internet. Well, they make it very easy for youto get those PDFs wherever they are. They also make it easier for you to chunk your dataand to have like some differentconfiguration with a chunk of overlap.

We'll see later what it means. So those are like very handy. And then they also have integrationwith like the most popular tools you can think of. Um, so it's also, they make it very handyfor you if you need to use, um,bending models from different companies. Um, then you usually only have, uh,to change the, the import file.

So that's L chain. Then you have Alama. Uh, so if you've never used it, it allows youto run LLM uh, locally. And it's very handy. Only runs quantized LLM though.

Um, but then LAMA three,the smallest version can run on my laptop. It takes about four gigabyte to run. Uh, and that way you don't have, you know, you might be ableto use it when you're on the planeor when you are somewhere. They also allows you from recently to,to run embedding models. So it might be something that is very interesting.

So yeah, if you want, you can all lamaand then you, you'll be able to run LAMA three,but you can also run different models. Uh, they have a lot of models available,which is a vector database, so it's cloud nativeand distributed system architecturehas true separation of concerns. And it also has scalable index ation strategy. I won't go over the details with the different segments,but it's, uh, made for scalability. And then, uh, we have embedding models in general.

So as I said before, uh,for this one I'm using one from huggingface directly available. Uh, but depending on your needs, uh,you might want to use different models. So let's say if you work with, um, documentsthat are in a different language, uh, for example,I live in Germany, uh, so the mostof the documents are in German, sorry for that. So then you might want to use embedding models, uh,that are specialized in German language. It's the same if you want to use videos, you might wantto use embedding models that have been trained on videos.

And so then, yeah, high, high level current,when you start something to check out different embeddingmodels, usually you have a little boards on hi face. Uh, but they also have like listof like different ING models for like different use cases. So don't always go for open AI embedding models. Uh, just 'cause it's, it's very good for generic things,but for specialized things might be a bit tricky. So yeah, you have like, you have a lot of different ones.

Uh, so yeah, let's go a bit more intodetail in the embeddings. So the first thing, uh, that I already mentioned isthat when you use embeddings, you have to pick a model. And again, to, to repeat what I said, it's very importantto pick an embedding model that has been trained on thedata you have. So if you have one that hasn't been trained on videos,then it might not work. If you are gonna use Japanese textand it has been trained on English only, uh,it's very likely it will not work out as well.

Then you have to choose what to embed. Um, and then you have the metadata, uh,and I will go into that a bit later as well. So yeah, different embedding strategies. So there's level one,which is like embedding chunks directly. So you have your dataand then you split your data in two smaller chunks.

Um, and you might want to just embed the chunksand then you put everything in a vacate database. And then you can do similarity search. That works, it can work very well depending on your data. Uh, but then you have another level, which is level two,which is like a bedding suband super chunks as well,which can usually provide more context, um, for your dataand for your LLM to understand, uh, what's happening. And then the other level is like in operating chunkingand non chunking metadata, sorry.

So you can be like, you know, you can have like metadatathat is all about the author and everythingand that that can be like very importantto be able to understand your data. And as examples of like different metadatasof like chunking, uh, for chunking metadata,you have like paragraph position, section header,if it's like a larger paragraph as well, you know,you might want to to chunk that, uh, sentence number. So those are can be very useful. And then the non chunking one are like gonna be like, okay,who wrote this publisher,the organization, and different things like that. And usually those actually provide, uh,very helpful information for your rag app.

And you might be wondering, okay, sowhat does it look like then, you know, like, so that'swhat happens when you go, you pass your datathrough an embedding model. Like here we have an example of a text, uh,and then we pass it through an embedding models, uh,and then it gets your vectors in the end like that. That is only after it goes through an embedding model. And then, yeah, you have yourvectors and that's what we're gonna do. We're gonna do run similarity search on the vectors.

Uh, and that's how it communicates. That's what vector database do. That's how we understand everything. And so, yeah, takeaway for this one,the embedding strategy really depends on like the accuracywants the cost and the use case needed. Um, those are like, there's no real, you know,there's no real one solution, one fits all.

So yeah, have a look at different embedding models, uh,it might be very useful for you. Then there's chunking. So as I said, you know, like you haveto take into consideration different things. So the chunk size, um,how long do you want your chunk to be?You know, do you want it to be like one characteror do you want it to be a thousand characters?Uh, it depends. There's a sort of chunk overlap.

So it's gonna be the overlap between two chunks. Um, so sometime it can be useful so that you understand,you know, what's happening in new, between new chunks. Uh, and also character splitters. It might be very interesting, um, if you have like a comaor something, or if you have a dash might be veryinteresting to like split on dashes, for example. Uh, depending on what your data looks like.

And as an example, you can see here, um,since the data we have,and it's actually the data I'll use, uh, laterduring the demo, but here we have a chunk size of 50and a overlap of zero. And you can see, so I just added myself, the text is here,uh, but you can see that the data is not really,you don't really understand what's going on. You know, like nothing really matters. Like here, you didn't have like June 30th, um,there's nothing that really is important. So your LLM, we also very, we also struggleto figure things out as another example islike chunk size of 128 with an overlap of 20.

We can see here that, you know, we can, we startto understand what's happening, what's going on. Uh, and also you can see the overlaps. Um, you can see that, for example, for the first chunk, uh,it finishes with for the quarterly periodand it, uh, June 30th, blah, blah, blah. And then it's here. The same for the second chunk.

It starts with that. So that's the overlap. Um, and yeah, here you can see like, okay, it, it startsto be more interesting,but then you can obviously go higher, uh,chunks out 256 overlap of 50,and this one starts to be okay, way more interesting for us. You can always go higher. So then it's also like, usually people ask me like, oh,do you have like specific number you use?And unfortunately for now there's nothing, uh,like we don't have a specific number.

It really depends on your data. It really depends on what you want chunk. There's something that I'm using from time to time, uh,which is semantic chunkers. Um, they are available on link chain and the main indexand they can be very useful for you. They're gonna try to figure out, uh, that's the meaningof your sentence and they're gonna tryto gather everything together.

Uh, everything that makes sense. Everything that is similar, they're gonna tryto make one chunk of it. The thing is that if you want to use semantic junkers,you need to use an embedding models. Whereas the previous one, it is very stupid way. It's like it's just gonna read thecharacters and then it's gonna split.

So yes, that's like, I find it to be usually betterto have better results, but then you haveto use an embedding model for it. Also, something that is very important, likehow does your data look?Is it mostly conversation data or is it documentation dataor lecturer or q and a data?You know, if it's documentation data,you can already have some chunks of, you know, the,you may be able to have a paragraphand then you want to have, you know, like some super chunks. We wanna add some information about thetitle or the heading. Those might be very useful, uh,for your data and for your chunks. Uh, same for lecture or q and a.

If it's a Q and A data, you know, you might have at,at the beginning of the sentence,you might have something which like questionand then the actual question. And same for answers. So then, you know,it would probably be a good idea to, to have chunksthat are like divided by like question and answers. So you don't have to do like every256 characters, for example. So yeah, unfortunately there's nothing,there's no like amazing number.

So it's only the, like, the strategy depends onwhat your data looks like and what you need from it. And now I'm gonna go into bit more of a demo. So that was the weather slidesand there would be available later, so no worries. Uh, but yeah, I'm gonna go into a bit of a demo hereand as I said, uh, everything will run locally. So I have my Python environment, uh, which is hereand I'm using a docker compose version of vis.

So we're just gonna start it. It should already be running actually. Uh, so yes, then it starts,it start the different components of visand now we should be, we should be good. If we look, everything is here, everything's running,everything is healthy. So now we're happy.

I have a notebook ready as well. So that's where I will be running my code. Uh, so just to show you, this is the doca compost file. Uh, you find it on the VIS documentation directlyso you don't have to make it yourself obviously. Uh, and yeah, so this is the code.

Uh, so the first part iswhere I'm gonna import everything I need. So as I said, I'm using lung chain, so then, you know,I can, using the different components of lung chains. So the first part is the, uh, PDF order,but I go into more details later, uh, to make sureto not lie to you. Actually, I have this piece of code here, which willactually drop the connection if it exists already. So I'm just gonna run everything.

Um, so the collection has been dropped. Um, so then I actually have to recreate itand that's what we'll do during the demo. So this part here, um, iswhere actually I'm gonna load the PDF and as I saidbefore, um, long chain makes it very easy for youto load A PDF that is either local or on the internet. And I'm gonna show you what the PDF looks like. Um, this is, uh, A PDF about WeWork.

And yeah, so that's like, it's a very long PDF,it's 169 pages. You can see it's like every lotof financial information, lot of different things. And this is, this is the PDF we're gonnause, uh, for the demo. So I say, okay, go get this PDF please. Uh, and then I load it.

So I use the loader and then I actually loadthe data from the PDF. So now it's gonna go to this PDF on the internet,download it somewhere, and then it's gonnaload it into my memory. Uh, as I said, uh,also I'm using hanging face, uh, embeddings. So those are like Samsung performers. You'll be free to use different models.

Those are the one that he decided to use, um, for the demo. They're pretty good for text in general. And yeah, they're not too big as well. So it's, it's, it's a very niceembedding model that would say. So now that's what we're gonna use laterto then transform our data that is the PDF that you've seenbefore to transform it into vectors.

Then, you know,I was always talking about like the chunk sizeand the chunk overlap. So that's exactly what I'm doing here. This is what I have, um, gonna do, um, stupid chunk. So it's only at every 52, uh, of uh, 512 characters. Sorry.

Um, you will have a new chunk. There is nothing smart about it. Um, so that's what you will have. Um, and I'm saying please, uh, run this, um,chunker on the whole document. So that's what it's doing.

It's splitting the document to different documentsand then we have all the splits. So if I show you, uh, I can show you the splits,this is what it looks like, you can seethat it created different documents. Um, and then a document is basically a page. Uh, and then you can be like, okay,this is the content and blah, blah, blah. It's saying, okay, the page is page zero.

It also added some metadata, which are very,which can be very useful for you. For this one, it's not very usefulbecause it's just a PDF on the internet. So it's only the ULbut it can be like, you can also add differentmetadata if you want to. And then yeah, again, it gonna tell you which page itis, uh, everything. So those, uh, those are the splits.

Let me just remove these one. And here is where we're gonna put all our splits, the splitthat I just showed you before, that'swhere we're gonna transform themand that's where we're gonna store everything into vis. So I have all my splits, which are all my,which model documents, split it in two chunks. I have my embedding model, which I define here,which is again, sorry,which is again the hugging face embedding model. And then I'm saying, please create a collection.

Uh, please use the collection. That is rag vis webinar. And that's the name of the collection. And it's the one that I dropped before. So I'm gonna run that, that's gonna take a tiny bit of time.

'cause you have to transform everything. So the data now is being, you know, like you run the datainto your embedding model, which generates vectors. And then we store those, um, into our, uh, vector database,which is metaverse here, um,defining a retriever. So that's what I sharedbefore, you know, we like on the slidewhere you're gonna type your query, um,and then, you know, you need to do some semantic search. So, um, that's what it's gonna do.

Like, it's gonna send the data, um, uh, for query directlyto the embedding model, which then is gonnatalk to a vector database. So this is what I have for the retriever. The prompt here is, um, it's just a typical prompt to use,usually in rag, which is, uh,usually like you're an assistant, please,if you don't know the answer, say you don't know, uh,the answer so that you don't ate. Um, so yeah, that's what we have here. Now I defined vis, um, as a retriever.

Uh, I'm gonna define my LLM, uh, which is, uh,coming from Alama, which is LAMA three. Uh, this is what I have here. This is just detail,but LAMA three has a specific token, um, to saythat it stopped to say that he, he is like over, uh,so I'm just saying like, please stopevery time you see that one. Otherwise, usually lambath continue talking to himselfto itself, um, which makes it funny. Uh, but it's not what you want for your rag here, it just,um, to format my document.

To format my document. So like, it looks quite nice. And here's the right chain. So this is where you have all the magic basically. So you say, okay, as a context, uh, this is my retriever.

The question is the question that will ask you later,which is the one you see here. Um, so you just give it, you give it to your ragand then is doing the magic for you. And then yes, the prompt is what, uh, we pull here. So it's saying you are an AI assistant, blah, blah, blah,and then you're gonna give that to your LLM,and then you're gonna have an output. So that's what you define the rack chain.

Um, and I have adocument, I mean, I have a question already. We ready? Which is, what is WeWork?Uh, because the document is about WeWorkand so then you can see directly, uh, what is it saying?And that part is a bit slowbecause first I'm running everything locally. Um, and then it's when, you know, you give the contextto your LLM and the LLM has to find everything. And yeah, here it's um,it's like, it's gonna tell you about WeWork and everything. And I can also, like, you could also be like, okay,but maybe the LAMA three knows about WeWork.

So it could also be just like nothing that is relatedto this document, but then I can just ask questionsdirectly, uh, what is this document about?And usually you should tell me like, the document is aboutwhat we serve and yeah, exactly. So Moses saying it's a financial report. Uh, so then it's saying, yeah,it's a quarterly report by company. Uh, and then that's, that's it. You can then ask different questions, um, to your LLMand that's, um, basic rag application that you have.

And so basic one, um,that you like, everything is running locally. You don't have to pay for anything. Everything is actually open source as well. So then if you have any data,if you wanna build your rack system,that is something you can use in the future if you want. Uh, also something that is quite cool, uh, isthat long chains allows you to,to like add sources to them, to the answer.

So you can be like, okay, please, um, like add the sourceof your, at the source of your answers sothat way it can help, it can help a bit, um, to checkfor hallucinations to make sure that it's actually like,you know, not making something up. Uh, and here it can be like, okay, the context, this iswhat the context we gave and everything. And it can be like the source,we can see the source is directlythe document that we had before. And it can be like, okay, this is the page 12, uh,this is the page 69, page six 70. Um, so yeah, that's somethingthat is usually I find very interesting.

Um, adding source, uh, yeah, like especiallyfor your private documents,that way you can always make surethat the LLM is not making something up. And that's kind of it for a basic rag application. So I can, uh, take questions now. I don't, I haven't seen the questions,But yeah. Wayne, do you havesome questions? Um, thanks for the demo.

Um, if, uh, this is a question from the audience,if I embed my vectors with open ai, then embed the querywith another model like hugging face,will the response still be semantically similar?Essentially, can you use different embedding modelsand still get distance between results in a meaningful way?Uh, you can, you can use different embedding models,but then it also depends on, on like what are you, what,what would be the idea behind it is like,I guess if you use an embedding model specific one, uh,for your question, um, is it like,is there a different reason why you wouldn'tuse it for your data as well?Like it might, you might have, it's possible,but you might have very different results. Like they might not be as good. Uh,Um, there was a additional clarification for legacy data. For what? Sorry? For legacy datathat's embedded with open aiAh, for legacy data. Yeah.

Yeah, you, I mean you could,but then it's like, I'm not,I'm not gonna guarantee there is the qualityof the results, basically. Yeah. Um, okay, let's, oops. Okay. Um, if we did not answer that question fully,I see that there's a couple extra comments.

Um, feel free to, um, so submit again. Um, moving on to another question. When using alama in the notebook,is it loading a Quantis version of the modelor is the doctor Doctor compose already running an alama onthe notebook is connecting to it?No, so it's running a contentized version on my laptop,and then Alama starts a server,which then the notebook connects to the server directly. So I basically have a local alama alama server on my laptopnow running and everything is contact. Yeah.

Um, there has been a requestof is it possible to get the code?Yes, it's actually on my GitHub, uh,ll shadow link directly after it's already up. Perfect. And we'll include it in the follow up emailas well when we set out the recording. Um, is there a, another question, is there a smarter wayto chunk if an embedding model is used?I thought you said it wasn't necessary if anembedding model is used. Yeah, so there's the semantic search, uh,semantic trunking, sorry, uh, which is shown here,which this one requires, uh, an embedding model to be used.

So if, because it has to figure out, you know, the,the meaning of the sentence. So yes, for this one you need an embedding modeland it's usually smarter to chunk that way. The other one, this one, uh, is the stupid chunk,which only, you know, like splits everythingafter each characters. Um, it says thank you. Is there an example of the semantic chunking?Yes.

I don't have it here directly, uh,but I have it somewhere else on a, on another database, yes. Okay. Um,and we can send more, we can send more in the follow up. Um, so here's a question. Uh, how do you usually evaluate your models?It depends on my side.

I like to have answers that I know that are true. So usually I, I will like, yeah, I will talkto my model and I'll be like, ask questions,very specific questions, which are know, are true, uh,and then I'm gonna check the answers. Then there are like different evaluation models that,different ways of operating a model. But yeah, that's usually the way I do it, uh,for like usually 10 20. Uh, and so that I know that are true, like I have an examplefor, I work with the Berlin Parliament, uh, data,and they still use fax machines.

Uh, and I know the exact number, the, the exact amountof services that have to use fax specialties 189. So then depending on the embedding models I use,depending on the LLM, I don't have the same answer,but I know the real one is 189, so that's what I checkIs, uh, is there a wayto extract semantic concepts from textas a vector representation using an LLM?CanYou repeat the question please?Yep. Is there a way to extract semantic concepts from textas a vector representation using an LLM?I'd say so, yeah, that's, um, that's, that'swhat happens when you pass it through an embedding model. Like you get your text and then like you have your embeddingmodel where then it's gonna make you some, like,it's gonna try to figure things outand try to yeah, make some sense of the data. And then you have directors,Um, here's another one.

How do we solve the problem?For example, in the PDF,there is some information which is related,but expanded on multiple pages. So how do we get complete information while querying result?What chunk sizes and overlap should we use? Yeah,Good, good question. That'slike when you have to play aroundwith the different overlaps, um, with your chunks,and that's like, you'd be like,or maybe like you gonna chunk some metadata, you know,maybe you're gonna chunk like, um, your,you're gonna have your like small chunkingof like 512 characters for example,but then you also want to have,you wanna embed your metadata about like different,about your document or then it can embedlike super chunks as well. So like a small part of your documentand then have some, some super chunks. Uh, what I like usually is the cementing chunkersbecause they might be able to figure that out.

But then it also means like, you know, you have to,when you split your data, um, into different pages, you haveto make sure that then they understand, you know,they're on different pages and maybe tryto remove like those different pages, you know, like tryto have your text as one part, um, or things like that. But it's unfortunately a lot of trial errorand errors you don't,for now you don't really have anything and that is like magical. So, uh, and this is my own question, so, um, I,because you can specify the, the chunking,the chunk size and the overlap. Yeah. Would you do sort of a similar thingwhere you were talking about, you know, sortof testing your LLM where you like, you know,that there are certain answersor are you kind of using the same premise of you know,what you're expecting to seeand that's how you would kind of check yourchunk and overlap size?Or is there a better way to sort of confirmthat you've split it correctly?Yeah, usually that's what I do as well.

It's like I'm, I'm actually gonna ask some questions with,um, you know, to the end of one page for example,and then continue on the other one. Uh, and then I'm gonna check be like, Hey, uh,did you figure that out by yourself?Or like, you know, can you,can you actually figure that out?Uh, and then yeah, usually with the check of app, um,it can be useful because then you can continue. But then, yeah, it just lot of trial errors usually. Um, how are you running llama in local? Is it less size?I think smaller,Uh, it's four gigabytes for the smallest, uh, versionof, of LAMA three, yes. So,Um, A lot of laptop can, you can run it.

How well can LLM models with RAG help to reviewand answer law related questions?Which question? Sorry. Law related. Oh, uh, good question. I think it can be very good,but it's also like you, I mean,they're just usually usual data. You just have to make sure it doesn't hallucinate at all.

So the good part is like maybe like, you know, making surethat you always have a source, uh, like, you know,really double checking those. Uh, like with the source that you have,like the answer you have then checking the sourceand be like, okay, does it actually exist?Is it really, is it really a thing? Um, those usually work. And then low is, what's nice is that, you know,it has this very specific format,low is very strict on the format of the documents. Uh, so that usually you can,like your chunking might actually be goodor way better than like any other documentsbecause you can actually split documents like your logicthem, it makes sensein the document, you know, there's a logic. Uh, so those are like things that can be really helpful.

But I would say yeah, have sourcesand check the sources of your land m when it, when it like gives you an answer. And I assume that the, the same would hold truethat you would wanna make sure that you found a model,like you would wanna be specific in, um, what you choosethat was sort of like, that you're using like embeddingmodels that are trained on, um, lawand then sort of be thoughtful about the element. Exactly. Exactly. And like let's, like if you have a many rolesthat have been training on thatkind of data, it's way better.

And yeah, it's also the kind of thing where you don't,you need something that is like in your language. So if it's in English, then find onethat has been trained on, you know, English low dataand different languages as well. Um, when do we need to use token based chunking comparedto chunking per characters?Honestly, it depends. It,I don't think there's like a real right answerto that, unfortunately. So chunking, it's gonna depend, uh,I usually don't go for token based.

Uh, just it's my personal preference. I didn't find any like performance improvements. Um, so yeah, only, only depends on your side. Uh, to clarify on the questionregarding switching embedding models, are you sayingthat we don't know if the old legacy data will still allowthe model to get semantically similar responses?No, we would just have to testor reed all of the old legacy datawith the new model we've chosen. No, it's, um, I mean, first it's also like you, you haveto be careful because your first model will give you, like,will vectorize all your data,then it'll be a certain dimension.

And then you, you know, the other data is,might be like a different dimension. So you might, you might run into a problem, uh, hereand then if you solve that, it's just like,I'm gonna give a very stupid example,but if your old embedding model has been trained onChinese, uh,and then, you know, the other one is like trained onEnglish, then you know, they're like,technically they're a bit vectorized,but then they're just, theyhave like very different meanings. 'cause you use like a different bending model. So like you have to be careful, uh, that they can kindof work together, if you see what I mean. It's like, yeah, like you have different embed models.

Um, so yeah, make sure that they are like,been trained on the same data, like different things,but then also like you haveto then work with the different dimensions. Is the approach for multimodal ragor for multimodal rag application similar?Yes, yes. It's very similar. Uh, it's, uh, it's the same. You just have like, your chain will be a bit morecomplicated, uh, but thenmultimodal otherwise will be the same.

Uh, it would be like you have different embeddings, uh,different models, um,and then depending on like what you have,then you're gonna be like, okay, this is my chain. Uh, there's an image, there's text, uh, then you have to gothrough the different path. But yes, it's, it's very similar. Um, and we've had another request for the,we will share the GitHub, uh, info, um,when we send out the replay, so you'll, you'll getthat delivered straight to your email, so you will get that. Um, and we'll send a link to the slides as well.

Um, is it possible this is, I've heard this one before. Uh, is it possibleto get the original data back from the embedding vectors?It's an approximation. So you can like, like,I don't think it's possible to like revert directlyto tell me if I'm wrong, but yeah, it's an approximation. So you can usually tell be like, oh, okay. But no, you can't really like reverse it.

I think that's been a, that was an early concern,particularly of big companies that were starting to do reg,um, proprietary data. Um, okay, next we have, how do you refine the concepts,for example, for domain specific concepts when extractingconcepts from text, for example,if you have very domain specific conceptsyou're trying to find,Okay, me make sure I get the question right. So if you have some very specific concept in your document,do you think that's what it means?There's a follow up question from the same author. It says, how do you train an LLMfor domain specific knowledge?Ah, a training on for,usually I would say you fine tune the LLM, uh,with like your, you know, like your personal data,like your, I don't know, if you work in, in the worldof cars and you have little lot of car manuals, uh,you might what you fine tune it, uh, directly, which is,which can be also quite expensive. Uh, fine tuning LM can be like a, like, you know,very long task, uh, not as long as training one,but it can be very long.

So then the other solution, which usually would be like,you know, have like right applicationfor your specific data. So if you have like, again, both by examples of cars, uh,inject, inject like manual data, uh, manual manual cars,and then put it in your vector databaseand then you can have like somethingthat is very specific for your context. Um, and if we didn't answer a part of that, um, feel freeto resubmit, but I, I think we, we got that one. Um, but yes, if, if we missed, uh, any component ofthat question, please resubmit. Um, here's another one.

How do you improve the performance of LLMs?Wow. So I'm gonna assume likein this context, uh, for like rag in general, um,everything, the two things, chunking embeddings usually,uh, are actually the key. Uh, if the question is about LLMs in general,that's a very deep question. Uh, but for ag it's really like, yeah, make sureyour embedding model is the correct one. Make sure that, you know, like you have the same data.

If your goal is to detect pictures of, you know, ifto detect cuts, uh, in pictures, you have to make sure that,you know, that has been trained on that. Um, and then yeah, chunking,chunking is also a big difference. Like, you know, I, I am, I show this one like chunk sizeof 50 overlap of zero, clearly you can seethat nothing makes sense. Uh, you probably have no idea of what's happening. Maybe, you know, if you read quarterly reports in the pastyou can, but an LLM wouldn't be able to figure things out.

Whereas if you have semantic chaker, you know,you have sentences, things kinda make work kinda make sense. So, so yeah, basically those are, those are usually the twothat you're like, you have to be very careful. Um, thank you. Um, the questions just keep in, uh, popular topic. Uh, there are a lot of advanced rag techniques.

Which ones, uh, do you find very useful?Uh, yeah, it's, um, on my side. Um, I think it depends, depends on my use case. Uh, the best one, I mean, the one I've seen the best so far,uh, have the best results is like, I tendto keep my rag apps simple. I just no doubt on the embeddings. So that's usually like, you know, I, I find it to be better,um, to like have better result than like going really,really into like really deep rag, you know, systems.

Uh, that's at least from my personal experience,I feel like, you know, with, uh, different embeddingsand really like finding the one that is like really good,uh, as usually better, better results. Mm-Hmm. Just, just add on to tag onto that question. Are you using any, um, you know, like evaluation tools,um, to evaluate your rag?Or is there anything that is top of mind that you liketo use or is kind of a go-to? Yeah,I feel like, so there's a thing of like, you know,using LLM as judge, um, there's somethingthat is quite good, nice. Uh, I've been playing out with Ragga recentlyand I wanna also try, uh, Phoenix,which are like evaluation tools.

Um, and so yeah, those are like, I tried with RGAsand it was like quite nice, but I wanna try with Phoenix,uh, to see like, you know, if you can like, yeah,make my life easier as well so that I don't haveto do it myself, you know, like checking things directly. But yeah, using NLM, uh, to also judge ULLM. Yep. And on that topic, we do have a past webinar, uh,from the Arise Team about using Phoenix for evaluation. So I'll drop a link to that in the chat.

But that's, um, if you're interested in rag techniques,advanced rag techniques, um, that's probably a good sessionto watch the play. Um, so I will share that with you in just a second. Um, can we do RAG for CSV Excel data sets?If yes, then how do we split these data sets?Um, so I haven't tried with Excel directly,but CSV, yes, uh, I assume that has like integration withso many different document types that it's possible. Uh, one example like they can dofor notion pages and different things. Um, so like CSV, then you would have your ERs, uh,and then then it depends on what you have,like comes in your CSV as well as like, I don't know,in CSV you have like so many different kindof content you can put in, um, that it depends,but yeah, otherwise it would be the same.

Like you have certain ERs and then, which are e-commerce,and then depending on that you have to figure things out. So then depending on your data, if you have like,I don't know, 10 columnsand then in nine of them you have like, you know, numbers,which are, you know, not very relevant to youand only in one, then you have the text,which might be very relevant for your rag app,then maybe you want to only use that oneand then you chunk, uh, on this specific column. So yeah, CSV and then Excel will be the same. Uh, I assume they have an Excel connector. Uh, they probably, they have connectors for everything, so,so yeah, it's um, same, same but different as we say.

Uh, let's see. Just checking inthe question feed one more time. I'm gonna drop that link, uh, into the chat really quick. Um, makes sense. Do Vector in structured data, like in a large SQL database,uh, fine tuningor a multi-agent system will be a better choice?Not quite sure I understand that one.

Okay. What's the beginning of the question?It says, uh, it says, I think makes sense. Do vectors in structured data like in a large SQLdatabase fine tuningor a multi-agent system will be a better choice?Uh, if you can, you might wanna resubmit, um,more detail to the question. I think we're, we're missing a, a piece of itwhile we wait for them to, uh, clarify the question. I'm just gonna drop in the link, uh, to the replay of, um,from Arise the webinar from Arise on using, um, Phoenix,um, to do right evaluations, um,and Phoenix is open source as well.

Yes, There you go. Um, okay, let's see. Are there smaller LMS that can be less expensiveto fine tune and maybe run locally?Yes, we have a lotof l LMS available then it really depends,depends on your use case. Um, you could also fine tune them locally if you have time. Uh, but it's always possible you have like some very, very,well, I dunno if they're then lms,but you have LMS language models, uh, smaller ones,which are, which can be decent.

Uh, it really depends on your use case. And also checkouts usually, sorry, yeah,check out on like hugging face and everything. You can see people that are fine tuning. Um, classical lambs, like LAMB three has been fine tunedon different things so far, like function coding,it has been, is been fine tuned. So like take out what the open source community is doing,uh, and maybe someone is alreadydoing the work that you want to do.

So this is, uh, this is my own question. Um, I would like to, uh,know when you first started building your first rag apps,where did you go wrong?If you could kind of go backand give your, you know, past self advice on what to door what not to do, uh,I think that'd be really helpful for our audience. Sure. I think the first thing for me was really likeI had no idea what chunking was, uh,or like, you know, how it would like workand like I could actually make it work. Uh, and then I was just chunking on like every,I think at first I was checking on every space, which is,you know, like then nothing makes sense.

Um, basically when you build like your first rag,if you can't figure it out yourself,when you look at the data and you look like, you know,what you're feeding to ULM, usually it's very hardfor your m to then figure things out. Um, obviously it's possible, uh,but it's usually, it's really hard. So make sure like you canhave some understanding of what's happening. Uh, like, so yeah, for the chunks, for example,if I look at my chunks, if I pick a chunk randomly,it's very likely that I'll be ableto understand the context of this chunk. So that's like very useful.

Then the other thing was that I was using OpenAIfor everything, which is very good, you know, like,I mean it's very, it's a generalistic model. It's like it does the job,but then I realized that, yeah, it's, uh,that's why I keep saying that's why I keep talking aboutembedding models is that the embeddingmodels makes a big difference. Um, so that's, um,that's usually the second mistake you make isthat you just go for whatever embedding modelsthat OpenAI has. Um, and then sometimes it doesn't work,so then you're gonna try, you know,to do like some very advanced, uh, things with your RAG app,whereas you could just change the,the embedding models usually. So those are like the usual twos.

Thank you. Um, this is I think, an appropriate,uh, last question. So we're just kind of gonna do a last call. Um, could you summarize your presentationor rag apps in two sentences?I missed the first five minutes unfortunately,and it would be nice to have a nice closer. Yes.

Um, you can use RAG locally, uh,with an LLM locally, uh, with like 25 lines of codes. That's what I did. Perfect. Uh,and for that, uh, attendeewho joined, that's a little bit late. We will send you the recording so you can rewatch itas many times as you need to.

Um, thank you everyone for joining us todayand all of the really wonderful great questions. Um, oh, sorry. We've got a last minute addition to the question aboutstructured data, like an large SQL database to do a chatwith a SQL database, for example. Would it make sense to use buSo you want to, if you wanna know what is, you know,SQL database, then it's possible you have to like embedwhat is in there. I guess somehow.

I know it's possible to talkto databases, I mean to chat with them. Uh, so then as, as, as long as you can transform themand so them into bu, then yes, uh,but you have to transform the content and the context. Thank you Stefan. Um, thank you everyone who joined us. We will catch you next time.

Um, keep an eye out on your inbox for the linkto the recording, the notebook that Stephan walkedthrough, um, and the slides. And, uh, feel freeto join us on a future training or webinar. Thanks everyone. Thank you.

Meet the Speaker

Join the session for live Q&A with the speaker

Stephen Batifol
Developer Advocate
Stephen Batifol is a Developer Advocate at Zilliz. He previously worked as a Machine Learning Engineer at Wolt, where he was working on the ML Platform and as a Data Scientist at Brevo. Stephen studied Computer Science and Artificial Intelligence. He enjoys dancing and surfing.

A Beginners Guide to Building a RAG App Using Open Source Milvus

What will you learn?

Topics Covered

Meet the Speaker

AI Assistant