- Events
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Webinar
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Join the Webinar
Loading...
About this webinar
Learn how Milvus and Feast can be used together to scale vector search and easily declare views for retrieval using open source. We’ll demonstrate how to integrate Milvus with Feast to build a customized RAG pipeline.
Topics Covered
- Leverage Feast for dynamic metadata and document storage and retrieval, ensuring that the correct data is always available at inference time
- Learn how to integrate Feast with Milvus to support vector-based retrieval in RAG systems
- Use Milvus for fast, high-dimensional similarity search, enhancing the retrieval phase of your RAG model
WEBVTT
1 00:00:03.805 --> 00:00:05.825 So I'm pleased to introduce to the Sessions, smarter RAG
2 00:00:05.825 --> 00:00:07.585 Pipelines with Milvus
3 00:00:07.585 --> 00:00:10.145 and Feast with our guest speaker Francisco today.
4 00:00:10.895 --> 00:00:12.785 He's a senior principal engineer at Red Hat,
5 00:00:12.925 --> 00:00:15.865 having spent over a decade working in AI
6 00:00:15.925 --> 00:00:18.065 and ml, also software, FinTech
7 00:00:18.125 --> 00:00:21.985 and ai, LAIG, the Commonwealth Bank of Australia,
8 00:00:22.335 --> 00:00:25.865 Goldman s Sans, Goldman Sachs, sorry, fast Affirm.
9 00:00:25.925 --> 00:00:28.825 And Red Hat in role, spanning from software
10 00:00:29.125 --> 00:00:30.545 to data engineering, credit, fraud,
11 00:00:30.545 --> 00:00:31.945 data science, and mesh learning.
12 00:00:33.005 --> 00:00:36.225 He holds a graduate degrees in economics and statistics
13 00:00:36.285 --> 00:00:39.505 and data science and mesh learning from Columbia University,
14 00:00:40.285 --> 00:00:43.945 uh, in the City of New York and also Clearstone University.
15 00:00:44.415 --> 00:00:47.465 He's a maintainer for Feast, the Open Source feature store,
16 00:00:47.565 --> 00:00:49.825 and a steering committee, me member for Cube Flow,
17 00:00:50.315 --> 00:00:52.745 which is the open source ecosystem of Kubernetes
18 00:00:52.745 --> 00:00:54.185 for competence for ai.
19 00:00:54.185 --> 00:00:57.905 And ML Francisco. The stage is yours. You may take over.
20 00:01:00.505 --> 00:01:03.405 Hi everybody. Uh, I'm Francisco. Pleasure to meet you.
21 00:01:03.465 --> 00:01:06.005 See you. Um, I'm gonna take off my hat, um,
22 00:01:06.065 --> 00:01:09.885 but just for consistency of my profile photo on the, on, on,
23 00:01:09.905 --> 00:01:12.565 on the webinar, I figured I'd I'd show up with it.
24 00:01:12.985 --> 00:01:15.725 Um, so, uh, today we're gonna talk about, uh,
25 00:01:15.775 --> 00:01:20.045 feast Rag Milby, and I am going to share my screen.
26 00:01:21.715 --> 00:01:25.535 Uh, can folks Yes. In the chat, let me know. Okay, perfect.
27 00:01:25.845 --> 00:01:29.015 Confirmed. Um, great, great.
28 00:01:29.275 --> 00:01:31.535 So let's get the party started.
29 00:01:31.955 --> 00:01:35.095 Um, so, uh, you know, uh,
30 00:01:35.935 --> 00:01:37.525 peace RVIs, that's what we're gonna talk about.
31 00:01:38.585 --> 00:01:42.245 Um, because I thought I'd tell folks a little bit about me.
32 00:01:42.525 --> 00:01:44.085 I, I think it was already covered.
33 00:01:44.345 --> 00:01:46.525 Um, you know, but I wanted to give a little bit of context.
34 00:01:47.205 --> 00:01:51.005 I, I've, you know, led, um, data science, data engineering,
35 00:01:51.165 --> 00:01:52.285 ML infra teams at different
36 00:01:52.485 --> 00:01:53.645 companies over the last 12 plus years.
37 00:01:53.825 --> 00:01:57.165 Um, and somehow I stumbled into maintaining feast.
38 00:01:57.225 --> 00:02:00.325 Uh, um, we shipped it at, at, at,
39 00:02:00.325 --> 00:02:04.045 at a previous company I worked at, um, scaled in production
40 00:02:04.185 --> 00:02:08.165 for, for checkout payments, uh, you know, for credit risk
41 00:02:08.165 --> 00:02:11.925 and fraud models, um, where low latency retrieval is,
42 00:02:11.925 --> 00:02:13.565 is a really, really important part.
43 00:02:13.625 --> 00:02:15.805 And, uh, high resiliency and uptime.
44 00:02:15.905 --> 00:02:19.805 And so, um, I kind of, uh, spent my career building models
45 00:02:19.825 --> 00:02:21.165 and then shipping models and,
46 00:02:21.225 --> 00:02:23.325 and that heavily relies on data.
47 00:02:23.465 --> 00:02:25.245 And so, again, that, that's kind of
48 00:02:25.245 --> 00:02:26.485 how I stumbled into feast.
49 00:02:26.865 --> 00:02:29.245 Um, you know, I, I did things the old way before,
50 00:02:29.305 --> 00:02:32.525 and then eventually we have kind of newer, uh,
51 00:02:32.595 --> 00:02:34.845 more structured way of, of serving models.
52 00:02:35.505 --> 00:02:39.165 Um, I joined Red Hat last year, almost to the day, um,
53 00:02:39.905 --> 00:02:41.325 to work on open source ai.
54 00:02:41.325 --> 00:02:43.685 And I feel very privileged to get to work on Feast
55 00:02:43.705 --> 00:02:46.045 and, you know, um, Q Flow
56 00:02:46.105 --> 00:02:49.205 and other communities, uh, really helping to, you know,
57 00:02:49.715 --> 00:02:53.685 work on making sure that, uh, AI is, is open and,
58 00:02:53.685 --> 00:02:55.725 and using the best, uh, of open source.
59 00:02:56.425 --> 00:02:59.245 Um, I have a wife and two children, and,
60 00:02:59.265 --> 00:03:00.605 and I call New Jersey home.
61 00:03:00.805 --> 00:03:02.845 I took this photo when I was in South Dakota.
62 00:03:02.965 --> 00:03:05.085 I used to live out west. Uh, it was a great time. I love it.
63 00:03:05.425 --> 00:03:09.005 Uh, out in Rapid City near Wyoming. Um, and that's me.
64 00:03:09.545 --> 00:03:11.625 So, let's see.
65 00:03:14.365 --> 00:03:17.225 So I wanted to start with some historical context, right?
66 00:03:18.185 --> 00:03:20.265 RAG is pretty popular, um,
67 00:03:20.805 --> 00:03:23.225 but oftentimes people haven't read the original paper
68 00:03:23.445 --> 00:03:25.025 or aren't aware about the original paper.
69 00:03:25.725 --> 00:03:28.025 And so I thought I'd give that brief history and context.
70 00:03:28.405 --> 00:03:31.385 And so, um, RAG is stands
71 00:03:31.385 --> 00:03:32.785 for Retrieval Augmented Generation.
72 00:03:33.045 --> 00:03:36.705 So the PA paper published in NIPS in 2020, um,
73 00:03:36.725 --> 00:03:38.345 by the Meta AI research team,
74 00:03:38.405 --> 00:03:41.345 or back then it was called Fair Facebook AI Research.
75 00:03:42.005 --> 00:03:46.345 Um, and, uh, uh, it looks like I missed a part
76 00:03:46.345 --> 00:03:48.185 of the finishing this bullet point in the second one.
77 00:03:48.185 --> 00:03:51.205 Anyways, um, the, the,
78 00:03:51.265 --> 00:03:55.605 the architecture talked about, um, you know,
79 00:03:56.395 --> 00:03:57.965 some things that people kind of emit today.
80 00:03:58.115 --> 00:03:59.885 They, they actually had two models at play.
81 00:03:59.885 --> 00:04:02.005 They had what's called the retriever, uh, in,
82 00:04:02.005 --> 00:04:03.285 in the diagram for that.
83 00:04:03.285 --> 00:04:05.885 I took a screenshot from in the paper, um,
84 00:04:06.825 --> 00:04:08.605 and the, um, generator,
85 00:04:08.945 --> 00:04:10.605 and there's a query encoder as a part
86 00:04:10.605 --> 00:04:12.525 of this retriever thing, which is, you know,
87 00:04:12.815 --> 00:04:15.845 we're all pretty familiar with like encoders,
88 00:04:15.845 --> 00:04:17.845 which take like a query or a sentence
89 00:04:17.845 --> 00:04:19.925 and then maps it into a vector, right?
90 00:04:20.025 --> 00:04:21.365 Um, a a set of numbers
91 00:04:21.465 --> 00:04:26.245 and like some varying length, um, which is set
92 00:04:26.345 --> 00:04:27.885 by the whatever model you choose.
93 00:04:28.025 --> 00:04:29.605 Uh, you know, people tend
94 00:04:29.605 --> 00:04:33.005 to arbitrarily set some large number like 584 or something.
95 00:04:33.385 --> 00:04:34.805 Uh, usually a power of two.
96 00:04:34.905 --> 00:04:37.685 But, um, you know, one of the things
97 00:04:37.685 --> 00:04:40.685 that was often missed in the dialogue about this is that
98 00:04:41.715 --> 00:04:43.525 when, when this Seminole paper came out,
99 00:04:43.625 --> 00:04:46.285 it was about the end-to-end back propagation
100 00:04:46.625 --> 00:04:48.085 of the retriever and the generator.
101 00:04:48.345 --> 00:04:50.925 And what does that mean? That means that they took some sort
102 00:04:50.925 --> 00:04:53.565 of model, some, you know, pre-trained weights
103 00:04:53.625 --> 00:04:54.645 and they fine tune them.
104 00:04:55.185 --> 00:04:56.845 Um, and I think that's really important,
105 00:04:57.065 --> 00:04:58.285 uh, grounding layer.
106 00:04:58.525 --> 00:05:01.605 'cause that's really not how people think about, um,
107 00:05:02.525 --> 00:05:04.365 rag in practice today and how people use it.
108 00:05:04.365 --> 00:05:06.325 They mostly think about it from inference perspective.
109 00:05:06.545 --> 00:05:07.725 And, and that makes sense. You know, it,
110 00:05:07.725 --> 00:05:09.005 it's not a criticism, it's just kind
111 00:05:09.005 --> 00:05:10.405 of a, a statement of fact.
112 00:05:11.025 --> 00:05:14.485 Um, and then, you know, so 2020,
113 00:05:14.865 --> 00:05:16.245 that's a, that's a while.
114 00:05:16.505 --> 00:05:20.205 Um, you know, why did it become so popular?
115 00:05:20.385 --> 00:05:23.685 And if you look at, at some data, um, probably
116 00:05:23.685 --> 00:05:27.965 because of chat, GBT, uh, chat, GBT, you know,
117 00:05:28.355 --> 00:05:32.925 disrupted the world, um, in October, 2022, um,
118 00:05:33.355 --> 00:05:35.365 they had in, in their original documentation,
119 00:05:35.365 --> 00:05:36.725 they had suggested using rag
120 00:05:37.225 --> 00:05:41.405 and phrasing, like in context learning wasn't as as common.
121 00:05:41.985 --> 00:05:43.005 Um, and,
122 00:05:43.905 --> 00:05:46.405 but what people found was if you just dump stuff into the
123 00:05:46.405 --> 00:05:50.285 context of an LLM, they work pretty well with some prompt,
124 00:05:50.465 --> 00:05:53.485 uh, instruction, uh, formatting, right?
125 00:05:54.025 --> 00:05:57.285 Um, and if you look at the Google Trends, um,
126 00:05:57.655 --> 00:06:00.945 which I screenshotted both here, you know, again,
127 00:06:01.345 --> 00:06:03.785 December, 2020 is when, when, uh, it was published.
128 00:06:04.005 --> 00:06:07.505 Um, and you see that, like it has pretty much, no, no,
129 00:06:08.595 --> 00:06:10.815 no mention, but you see that, uh,
130 00:06:11.565 --> 00:06:14.895 when chat GBT took off in October of 2020, um,
131 00:06:15.075 --> 00:06:19.255 that's when you see RAC really start to, to be popular,
132 00:06:19.555 --> 00:06:20.975 at least according to Google Trends.
133 00:06:21.555 --> 00:06:22.895 Um, and again, it's
134 00:06:22.895 --> 00:06:24.895 because they had sourced it in their documentation and,
135 00:06:24.915 --> 00:06:28.255 and, you know, I was working on some of this stuff, uh,
136 00:06:28.445 --> 00:06:31.455 back when that happened and, you know, used the open AI demo
137 00:06:31.455 --> 00:06:34.135 that they, you know, showed you how to do it.
138 00:06:34.195 --> 00:06:37.415 Um, and, uh, you know,
139 00:06:37.595 --> 00:06:39.535 it was surprisingly powerful as this.
140 00:06:39.635 --> 00:06:42.295 And, and that's kind of like taken the mind share of, of
141 00:06:42.645 --> 00:06:44.255 what we call AI engineering today.
142 00:06:45.415 --> 00:06:47.715 But again, there's, there's a really critical step
143 00:06:47.715 --> 00:06:48.835 that's like, I feel like it's,
144 00:06:48.835 --> 00:06:52.235 it's missing from the conversation here, which is, uh,
145 00:06:52.785 --> 00:06:54.675 most rag applications are only using inference,
146 00:06:54.675 --> 00:06:57.075 which is great, you know, uh, given that it works.
147 00:06:57.135 --> 00:06:59.475 But it's, it's also, uh, important to note
148 00:06:59.475 --> 00:07:03.395 that there was this whole, whole other story with rag.
149 00:07:03.935 --> 00:07:05.715 Um, and, and I think a lot of it is
150 00:07:05.715 --> 00:07:08.675 because it's very easy to dump
151 00:07:08.895 --> 00:07:12.755 and format, um, data and documents into the context
152 00:07:13.055 --> 00:07:14.835 and do vector similarity search,
153 00:07:15.255 --> 00:07:17.235 but it's actually a lot harder to do fine tuning.
154 00:07:17.495 --> 00:07:19.955 Um, that, that's just kind of a, uh,
155 00:07:20.915 --> 00:07:22.155 I don't think a controversial statement.
156 00:07:24.535 --> 00:07:26.435 So how does RAG work for those who are unfamiliar?
157 00:07:26.635 --> 00:07:28.635 I thought I, you know, we'll go through a, a simple example,
158 00:07:28.855 --> 00:07:31.755 um, which, you know, there are really four core steps,
159 00:07:31.845 --> 00:07:34.755 which is one, you embed data, like maybe take a document,
160 00:07:35.145 --> 00:07:38.315 PDFs, or just like some token, some, some, like a blog.
161 00:07:38.895 --> 00:07:41.035 Um, and you embed it, right?
162 00:07:41.035 --> 00:07:44.235 Again, map it into some vector space of all the sentences.
163 00:07:44.235 --> 00:07:45.555 So, so you find a way to partition
164 00:07:45.735 --> 00:07:49.515 or chunk the, um, the, the text.
165 00:07:49.975 --> 00:07:51.395 And then you take those partitions
166 00:07:51.395 --> 00:07:53.475 and each partition visually you embed,
167 00:07:53.705 --> 00:07:56.675 then you store those embeds with some primary identifier,
168 00:07:57.095 --> 00:07:58.755 um, into some database.
169 00:07:59.255 --> 00:08:02.155 And then, you know, in real time you
170 00:08:03.165 --> 00:08:05.195 embed what's called the user query, which is like,
171 00:08:05.335 --> 00:08:06.515 I'm gonna talk to the chat bot.
172 00:08:06.535 --> 00:08:09.475 I'm gonna say, what's, you know, the,
173 00:08:09.855 --> 00:08:11.315 the capital of the us, right?
174 00:08:11.735 --> 00:08:16.275 Um, and it's going like in real time, you'll also embed
175 00:08:16.275 --> 00:08:18.635 that query into a vector,
176 00:08:18.975 --> 00:08:20.075 and you'll use that vector
177 00:08:20.175 --> 00:08:21.955 to search everything in the database.
178 00:08:22.175 --> 00:08:24.475 And so Vector search and Pine Cone and,
179 00:08:24.615 --> 00:08:28.035 and vis, um, arose to really say, Hey, look,
180 00:08:28.035 --> 00:08:30.355 we actually support vector similarity search.
181 00:08:30.375 --> 00:08:31.635 And, and these things became very,
182 00:08:31.985 --> 00:08:33.155 very prolific and popular.
183 00:08:33.735 --> 00:08:36.075 Um, and, you know, mil, this had been around quite,
184 00:08:36.075 --> 00:08:37.235 quite a bit longer, right?
185 00:08:37.295 --> 00:08:41.715 Uh, and earlier this, this than, uh, really 2020 too, right?
186 00:08:41.895 --> 00:08:44.915 Um, and I think it, it's important to note
187 00:08:44.915 --> 00:08:47.515 that vector similarity search as a construct has been
188 00:08:47.515 --> 00:08:48.595 around for a very long time.
189 00:08:48.855 --> 00:08:53.155 And information retrieval, um, has, has been, you know,
190 00:08:53.155 --> 00:08:57.085 practiced for quite a bit, you know, standard, um, you know,
191 00:08:57.325 --> 00:08:59.605 retrieval and recommender systems have been using this for,
192 00:08:59.605 --> 00:09:01.485 for quite a long time, which my understanding is
193 00:09:01.485 --> 00:09:03.685 that's actually how bu, uh, originally started.
194 00:09:04.345 --> 00:09:07.965 Um, and so, you know, at the end you get this query
195 00:09:07.965 --> 00:09:09.765 and you retrieve it with Vector similarity search,
196 00:09:09.825 --> 00:09:11.685 and then you inject that into the context
197 00:09:11.825 --> 00:09:12.925 and go on your merry way
198 00:09:12.985 --> 00:09:15.725 and have your, your LLM generate some sort
199 00:09:15.725 --> 00:09:16.925 of response and you hope it works.
200 00:09:17.305 --> 00:09:19.725 Um, and in practice it worked pretty well.
201 00:09:19.985 --> 00:09:23.085 Um, so,
202 00:09:25.275 --> 00:09:26.855 so how can Feast help with rag?
203 00:09:27.455 --> 00:09:30.265 I think, um, you know,
204 00:09:32.425 --> 00:09:36.375 feast was really grounded on, in as a feature store,
205 00:09:36.395 --> 00:09:38.295 and a feature store was really aimed at helping
206 00:09:38.875 --> 00:09:41.655 reduce the complexity in taking models,
207 00:09:41.935 --> 00:09:44.575 particularly tabular ones to production.
208 00:09:44.835 --> 00:09:47.215 Um, and it turns out
209 00:09:47.215 --> 00:09:48.735 that the hardest part about chipping models
210 00:09:48.735 --> 00:09:52.095 to production in the, in the tabular predictive ML world,
211 00:09:52.595 --> 00:09:54.535 um, isn't really the model itself.
212 00:09:54.855 --> 00:09:57.255 Actually, when models are small, inference reduces
213 00:09:57.255 --> 00:09:59.095 to really just being a calculator in real time.
214 00:09:59.195 --> 00:10:00.775 And, and that's not hard.
215 00:10:00.805 --> 00:10:02.415 It's actually what's hard is orchestrating data.
216 00:10:02.435 --> 00:10:03.775 And that was those, you know, kind of,
217 00:10:03.815 --> 00:10:05.735 I mentioned the early part of my talk.
218 00:10:05.735 --> 00:10:07.335 What I found at working with these enterprises is
219 00:10:07.335 --> 00:10:12.255 that you have so many disparate databases systems, um,
220 00:10:13.045 --> 00:10:16.175 that finding a way to centralize all of the data
221 00:10:16.175 --> 00:10:20.575 that you have so that you can then, like featurize, um,
222 00:10:20.725 --> 00:10:22.855 that data and then serve it
223 00:10:22.855 --> 00:10:25.295 to a model is actually quite hard, um,
224 00:10:25.365 --> 00:10:27.015 both technically and organizationally.
225 00:10:27.115 --> 00:10:29.695 And so, um, you know, I've worked at places
226 00:10:29.695 --> 00:10:31.375 that implement their own crude forms
227 00:10:31.375 --> 00:10:33.375 and feature stores for most of my career
228 00:10:33.425 --> 00:10:35.015 until I ended up maintaining one.
229 00:10:35.475 --> 00:10:38.855 Um, and, uh, there's a joke I like to tell about, uh,
230 00:10:38.855 --> 00:10:40.455 experience I had when I was at the Commonwealth Bank
231 00:10:40.455 --> 00:10:42.135 of Australia, where I flew to Sydney twice
232 00:10:42.755 --> 00:10:43.775 to get data from somebody.
233 00:10:43.775 --> 00:10:45.175 And it's actually a true story
234 00:10:45.175 --> 00:10:48.735 because it wasn't as obvious
235 00:10:48.795 --> 00:10:50.295 for Trivial to actually get that data.
236 00:10:50.635 --> 00:10:53.655 Um, and so it, it's a real, real problem for enterprise
237 00:10:54.115 --> 00:10:55.855 and Feast aims to solve that, um,
238 00:10:55.955 --> 00:10:59.535 by providing a centralized, you know, platform that, um,
239 00:10:59.895 --> 00:11:02.855 stitches together your existing infrastructure, um,
240 00:11:02.995 --> 00:11:05.255 and enables you to be successful in shipping models
241 00:11:05.255 --> 00:11:08.855 to production with the right patterns, uh, and permission
242 00:11:08.855 --> 00:11:11.975 and governance and, you know, server and everything else.
243 00:11:12.715 --> 00:11:16.415 And so the premise is Feast helps with Rag
244 00:11:16.515 --> 00:11:19.295 by empowering MLEs to do what they do best,
245 00:11:19.345 --> 00:11:21.215 which is harness the power of data, um,
246 00:11:21.655 --> 00:11:22.895 MLEs and data scientists.
247 00:11:22.895 --> 00:11:25.295 There's ambiguity about, you know, what the nuances
248 00:11:25.295 --> 00:11:26.215 between the two are, but
249 00:11:26.295 --> 00:11:27.455 I'll, I'll treat them as equivalent.
250 00:11:28.035 --> 00:11:31.375 Um, and so, you know, with Feasts, it's, it's kind of easier
251 00:11:31.395 --> 00:11:32.735 to ship rag to production.
252 00:11:33.195 --> 00:11:35.575 Uh, feast is battle tested support, uh,
253 00:11:35.805 --> 00:11:37.815 real time batch and streaming data.
254 00:11:38.115 --> 00:11:41.055 As I mentioned before, um, at my last role, you know, we,
255 00:11:41.075 --> 00:11:44.255 we ship streaming, we ship real time, we ship batch data,
256 00:11:44.385 --> 00:11:47.095 batch data sets of inserting sizes,
257 00:11:47.095 --> 00:11:48.535 like 360 million records.
258 00:11:48.915 --> 00:11:51.575 Um, and, you know, fee scaled, I mean, you, you start
259 00:11:51.575 --> 00:11:55.045 to hit, um, you know, really scaling this with the database,
260 00:11:55.065 --> 00:11:57.365 uh, that you're using is really what it reduces to.
261 00:11:57.785 --> 00:12:00.765 Um, but you know, we, we actually had great uptime,
262 00:12:00.985 --> 00:12:02.165 um, using Feast.
263 00:12:02.185 --> 00:12:07.165 And so, um, I think it's, um, it, it, it works.
264 00:12:07.235 --> 00:12:09.485 Yeah. And it's been worked by it, it, it works
265 00:12:09.505 --> 00:12:13.005 and it's used by lots and lots of, uh, great, uh,
266 00:12:13.005 --> 00:12:14.245 and powerful enterprises.
267 00:12:14.245 --> 00:12:15.525 And so we're, we're quite proud of that.
268 00:12:15.705 --> 00:12:16.765 Um, and
269 00:12:16.765 --> 00:12:19.085 so we're a little bit slow on getting Rag fully featured,
270 00:12:19.085 --> 00:12:20.725 but now we're, we're in a good place with it.
271 00:12:21.385 --> 00:12:24.545 Um, and so it's, it's really built
272 00:12:24.545 --> 00:12:26.185 for distributed computing and ingestion.
273 00:12:26.185 --> 00:12:28.265 And I think, you know, I, I mentioned the, the insertion,
274 00:12:28.285 --> 00:12:32.505 but, you know, spark support in Feast, um,
275 00:12:33.085 --> 00:12:34.465 is a really powerful mechanism.
276 00:12:34.725 --> 00:12:37.905 You know, we, uh, it was donated by the ADIAN folks, uh,
277 00:12:37.905 --> 00:12:40.145 so I want to give a big shout out to them, um,
278 00:12:40.325 --> 00:12:41.465 as the offline store.
279 00:12:41.645 --> 00:12:44.745 And, you know, the complication that comes up in,
280 00:12:44.765 --> 00:12:48.025 in generating training data for fine tuning is often, well,
281 00:12:48.025 --> 00:12:49.585 how do you embed like a million
282 00:12:49.905 --> 00:12:51.145 documents for training data, right?
283 00:12:51.485 --> 00:12:53.425 And then you start to use, uh, you know,
284 00:12:53.425 --> 00:12:55.145 like distributed computing frameworks,
285 00:12:55.145 --> 00:12:56.625 particularly like Spark or Ray.
286 00:12:57.125 --> 00:12:59.665 Um, there are others like dask. We also do use aask.
287 00:12:59.665 --> 00:13:02.025 We don't really use it that much for, um,
288 00:13:02.645 --> 00:13:04.305 for the offline store, uh, as much.
289 00:13:04.565 --> 00:13:07.185 Um, but, you know, these frameworks exist and,
290 00:13:07.185 --> 00:13:09.785 and ultimately they, they're, they're, you know, built,
291 00:13:09.925 --> 00:13:10.985 uh, within Feast.
292 00:13:11.965 --> 00:13:14.865 And so we treat fine tuning as a first class citizen
293 00:13:15.205 --> 00:13:18.465 and point in time correctness, joining data, making sure
294 00:13:18.465 --> 00:13:21.305 that you're not what's called, um, you know,
295 00:13:21.805 --> 00:13:22.825 you don't have data leakage,
296 00:13:22.825 --> 00:13:24.265 which is looking at data into the future.
297 00:13:25.645 --> 00:13:28.345 And so, um, these are all mechanisms
298 00:13:28.345 --> 00:13:30.265 of why FET is really helpful with rag
299 00:13:30.285 --> 00:13:31.905 and it, it's fully open source, right?
300 00:13:32.045 --> 00:13:33.385 Um, you know, that's, that's one
301 00:13:33.385 --> 00:13:34.985 of the really great benefits of it.
302 00:13:34.985 --> 00:13:36.945 That's why, you know, um, users tend to,
303 00:13:37.325 --> 00:13:38.385 to adopt feasts just
304 00:13:38.385 --> 00:13:41.145 because they don't wanna send their data, you know, outside,
305 00:13:41.325 --> 00:13:43.385 um, or they just want to control the service.
306 00:13:43.605 --> 00:13:48.535 And so that ends up being really helpful. Let's see.
307 00:13:48.535 --> 00:13:50.535 So Feast in Production, I wanted to kind of go over what,
308 00:13:50.535 --> 00:13:52.415 what the architecture looks like and,
309 00:13:52.475 --> 00:13:56.495 and, um, how things look like in, uh,
310 00:13:57.715 --> 00:14:00.935 in Feast and how that kind of naturally follows with rag.
311 00:14:01.355 --> 00:14:03.335 So in Feast World, there's two things.
312 00:14:04.445 --> 00:14:06.015 There's online infrastructure,
313 00:14:06.305 --> 00:14:07.815 which we call an online store,
314 00:14:07.815 --> 00:14:09.775 which is basically just a database that you'd use in,
315 00:14:09.775 --> 00:14:11.655 in like a consumer facing application
316 00:14:11.725 --> 00:14:13.495 that has high resiliency and high uptime,
317 00:14:13.875 --> 00:14:17.295 and an offline infrastructure, which is like a, a database
318 00:14:17.295 --> 00:14:21.575 that you, you use for, uh, model fine tuning, um, you know,
319 00:14:21.575 --> 00:14:24.535 like an offline warehouse, um, where you know,
320 00:14:24.535 --> 00:14:27.135 you're doing lots of reads, not tons of writes, and,
321 00:14:27.155 --> 00:14:31.255 and you're mostly like querying stuff like in Big Query
322 00:14:31.795 --> 00:14:36.455 or Snowflake or Spark, um, you know, on data that's,
323 00:14:36.475 --> 00:14:39.175 you know, not, not going to be, like, if it goes down,
324 00:14:39.245 --> 00:14:41.255 it's not gonna hurt your customers, basically.
325 00:14:42.135 --> 00:14:45.675 And so if you look at the diagram on the right, we have
326 00:14:45.745 --> 00:14:47.275 what we call a data producer.
327 00:14:47.655 --> 00:14:48.995 And so what a data producer is,
328 00:14:49.015 --> 00:14:50.475 is basically like an application.
329 00:14:51.135 --> 00:14:52.365 Maybe you're a payments company
330 00:14:52.365 --> 00:14:54.365 and you have like an authentication service, right?
331 00:14:54.365 --> 00:14:55.765 Like a, a customer logs in
332 00:14:55.765 --> 00:14:57.605 and you want to keep their track of their session,
333 00:14:58.345 --> 00:15:00.725 how many times they've, they've like logged in,
334 00:15:01.095 --> 00:15:02.925 maybe it's about, um, their payments,
335 00:15:02.925 --> 00:15:04.205 how many payments they've made in the past,
336 00:15:04.305 --> 00:15:05.725 if you're an e-commerce company
337 00:15:05.985 --> 00:15:07.525 or products that they've purchased.
338 00:15:08.145 --> 00:15:12.325 Um, what you would want to do is essentially that data.
339 00:15:13.105 --> 00:15:14.645 You might want to emit events
340 00:15:14.985 --> 00:15:19.345 or write data to an offline store so
341 00:15:19.345 --> 00:15:21.225 that you could go back later and analyze it.
342 00:15:21.225 --> 00:15:23.865 And like I said, without impacting any customer experiences,
343 00:15:23.965 --> 00:15:25.505 um, this, and this is pretty, pretty common, right?
344 00:15:25.505 --> 00:15:28.585 And do, like all business analytics is grounded in like,
345 00:15:28.895 --> 00:15:31.745 some sort of offline analytics store, like click House
346 00:15:31.745 --> 00:15:33.545 or something where it's like, I'm, I'm gonna query this data
347 00:15:33.685 --> 00:15:34.785 and learn something from it.
348 00:15:35.205 --> 00:15:37.425 And then you get to the extent next of like, well,
349 00:15:37.505 --> 00:15:39.945 I have an AI engineer who wants to an ML engineer who wants
350 00:15:39.945 --> 00:15:42.465 to build a model to predict something like, you know, what,
351 00:15:43.005 --> 00:15:46.825 um, what a customer's, you know, um,
352 00:15:47.585 --> 00:15:51.145 LTV is or if they're willing to buy this, this thing, uh,
353 00:15:51.165 --> 00:15:52.625 or build a recommendation engine.
354 00:15:53.885 --> 00:15:56.665 And that's what kind of the diagram in on the very left
355 00:15:56.665 --> 00:15:59.745 where the offline log, C-D-C-E-L-T, this is,
356 00:15:59.745 --> 00:16:02.665 this is about essentially emitting data to an offline store
357 00:16:02.665 --> 00:16:03.905 so that you can then analyze it
358 00:16:04.585 --> 00:16:06.125 and you get into this offline story.
359 00:16:06.125 --> 00:16:08.605 And oftentimes people just emit that to like an S3 bucket.
360 00:16:09.105 --> 00:16:12.485 Um, you know, some, some people will do actually, uh,
361 00:16:12.805 --> 00:16:15.445 emit events to Kinesis or Kafka, um,
362 00:16:15.705 --> 00:16:16.925 and, you know, those end up,
363 00:16:16.925 --> 00:16:18.325 you can use an S3 bucket as well.
364 00:16:18.545 --> 00:16:21.725 Um, CDC has changed data capture.
365 00:16:21.825 --> 00:16:25.085 And so if you have like, um, you know, ELT systems,
366 00:16:25.115 --> 00:16:26.725 this tends to be a pretty, pretty common thing.
367 00:16:26.725 --> 00:16:27.725 Five trans, an example
368 00:16:27.745 --> 00:16:29.045 of a provider that does that pretty well.
369 00:16:29.195 --> 00:16:30.525 There's Air, air Byte
370 00:16:30.525 --> 00:16:31.805 that I believe is the open source version.
371 00:16:32.505 --> 00:16:37.125 Um, and so from that, like emitted log data, usually that's
372 00:16:37.125 --> 00:16:38.485 where you generate training data sets
373 00:16:39.355 --> 00:16:43.575 and feast can nicely couple with things like Spark
374 00:16:43.595 --> 00:16:46.255 or Snowflake or whomever your offline store is
375 00:16:46.515 --> 00:16:48.615 and help you create training data sets.
376 00:16:49.075 --> 00:16:51.015 Um, that that's really what, what it's kind of
377 00:16:51.635 --> 00:16:53.335 big value proposition is there, you know,
378 00:16:53.335 --> 00:16:55.135 it has data preparation model training
379 00:16:55.155 --> 00:16:57.895 and back testing that, that you might wanna do within
380 00:16:57.895 --> 00:17:00.215 that like, batch world where it's, you know,
381 00:17:00.735 --> 00:17:04.695 a large computations on lots of, lots of, uh, data sets
382 00:17:04.695 --> 00:17:05.775 that, that will take a bit.
383 00:17:06.355 --> 00:17:08.735 Um, and then there's this.
384 00:17:09.235 --> 00:17:12.215 So, and that's all gonna stay within this data warehouse,
385 00:17:12.215 --> 00:17:15.255 this offline store land, uh, really model exploration.
386 00:17:15.275 --> 00:17:19.015 And in, in the, in the, um, Q flow ecosystem,
387 00:17:19.035 --> 00:17:21.575 we talk about this like the model development lifecycle.
388 00:17:22.545 --> 00:17:24.965 And you'll generally stay in the offline store there.
389 00:17:25.225 --> 00:17:29.245 Um, now moving back up into the, the, the diagram
390 00:17:29.295 --> 00:17:33.325 where we talk about the, the, the event hitting this kind
391 00:17:33.325 --> 00:17:34.605 of streaming application, Flink
392 00:17:34.605 --> 00:17:37.485 or Spark for architectural reasons.
393 00:17:37.715 --> 00:17:41.085 Some, some applications might want to actually
394 00:17:41.785 --> 00:17:43.085 use a streaming architecture
395 00:17:43.085 --> 00:17:44.405 or event driven architecture
396 00:17:44.405 --> 00:17:46.365 where they're just gonna admit events to a Kafka topic,
397 00:17:46.825 --> 00:17:48.645 and then consumers will subscribe to,
398 00:17:49.025 --> 00:17:50.725 or applications would subscribe to that topic
399 00:17:50.825 --> 00:17:51.885 and consume those events
400 00:17:51.885 --> 00:17:54.605 and process them on whatever cadence they they want.
401 00:17:54.625 --> 00:17:56.045 And so, um, clink
402 00:17:56.065 --> 00:17:59.765 and spark streaming are, are naturally, uh, good options in,
403 00:17:59.765 --> 00:18:00.885 in, in that, that case
404 00:18:00.885 --> 00:18:03.405 where you can actually build transformations that,
405 00:18:03.405 --> 00:18:06.645 that manage the throughput, that, that allows you to batch,
406 00:18:06.865 --> 00:18:08.525 uh, requests together and,
407 00:18:08.585 --> 00:18:13.085 and feature computations so that you're not, um, writing
408 00:18:13.145 --> 00:18:16.565 to the online store at too high of a frequency.
409 00:18:16.725 --> 00:18:17.885 'cause then, then you start to get to have
410 00:18:17.885 --> 00:18:19.005 some resource contentions.
411 00:18:19.005 --> 00:18:21.325 And depending if you get lots of volume all at once,
412 00:18:21.465 --> 00:18:23.125 you can actually end up incurring some,
413 00:18:23.125 --> 00:18:24.445 some challenges in production.
414 00:18:24.945 --> 00:18:29.285 Um, and so some data producers, they choose a, uh,
415 00:18:29.805 --> 00:18:32.085 a model where, you know, I have a sidecar,
416 00:18:32.705 --> 00:18:34.565 I'm writing events to an S3 bucket.
417 00:18:34.985 --> 00:18:36.965 Or maybe they wanna just, you know, instead
418 00:18:36.965 --> 00:18:38.365 of a sidecar, use Kafka, right?
419 00:18:38.745 --> 00:18:41.765 And then some people just do like batch dumps, right?
420 00:18:41.765 --> 00:18:43.845 Like every 24 hours, I'm gonna take up copy
421 00:18:43.845 --> 00:18:45.485 of the database and just dump it.
422 00:18:45.945 --> 00:18:49.645 Um, and then you might also want an architecture where
423 00:18:49.675 --> 00:18:52.045 that data producer, that, that, that application
424 00:18:52.585 --> 00:18:54.365 or service, I'm gonna write directly
425 00:18:54.365 --> 00:18:56.125 to the online store via API,
426 00:18:57.065 --> 00:18:59.085 and feast supports all of these architectures.
427 00:18:59.085 --> 00:19:00.285 And there are different trade-offs
428 00:19:00.285 --> 00:19:03.525 with these different right patterns, particularly for, uh,
429 00:19:03.525 --> 00:19:04.725 mission critical services.
430 00:19:05.505 --> 00:19:09.925 Um, like in payments, like in in, in lending, um, where you
431 00:19:10.795 --> 00:19:13.765 have different guarantees about the staleness
432 00:19:14.025 --> 00:19:15.565 or consistency is the language
433 00:19:15.565 --> 00:19:16.845 that's often used of the data.
434 00:19:17.185 --> 00:19:22.045 Um, you know, you, you'll want different, um, right patterns
435 00:19:22.105 --> 00:19:24.325 to the online store for, for different data sources.
436 00:19:24.785 --> 00:19:26.005 Um, 'cause again, they'll,
437 00:19:26.005 --> 00:19:27.285 they'll have different consequences
438 00:19:27.345 --> 00:19:28.485 to your consumer experience.
439 00:19:28.665 --> 00:19:30.845 And the most concrete example is that you want strong,
440 00:19:30.845 --> 00:19:32.725 consistently con consistency.
441 00:19:32.725 --> 00:19:34.765 If you're doing like, um, lending
442 00:19:35.225 --> 00:19:36.445 and you want to check out, uh,
443 00:19:36.445 --> 00:19:37.485 or you want to create a feature
444 00:19:37.485 --> 00:19:40.125 to calculate someone's total exposure, IE
445 00:19:40.125 --> 00:19:41.365 how much money you've lent them,
446 00:19:41.705 --> 00:19:43.125 you don't wanna get that wrong in real time.
447 00:19:43.145 --> 00:19:44.685 You know, you, you wanna make sure that like,
448 00:19:45.115 --> 00:19:47.725 that number is calculated with the most accurate data.
449 00:19:47.865 --> 00:19:49.525 Um, 'cause that can have, uh, pretty severe,
450 00:19:49.625 --> 00:19:50.805 uh, financial consequences.
451 00:19:51.745 --> 00:19:55.125 And then once you kinda hydrate this online store
452 00:19:55.235 --> 00:19:56.845 with the data that you need from all
453 00:19:56.845 --> 00:19:58.845 of your different places, again, batched,
454 00:19:58.885 --> 00:20:03.005 maybe a streaming producer, uh, maybe an online application,
455 00:20:03.505 --> 00:20:04.685 um, you know,
456 00:20:04.705 --> 00:20:07.125 and centralize it into this online store for serving.
457 00:20:07.665 --> 00:20:11.445 Um, then in your AI application, you can,
458 00:20:11.505 --> 00:20:13.885 you can actually talk with your inference provider.
459 00:20:14.015 --> 00:20:16.285 Maybe it's a separate service. Sometimes malls are so small,
460 00:20:16.285 --> 00:20:18.685 you can actually include them in your feature server.
461 00:20:19.105 --> 00:20:20.925 Um, again, for the tabular domain
462 00:20:20.925 --> 00:20:22.965 where models aren't super huge, that
463 00:20:22.965 --> 00:20:24.405 that actually isn't an uncommon pattern.
464 00:20:24.745 --> 00:20:28.085 Uh, but there's lots of utility in having explicit, uh,
465 00:20:28.405 --> 00:20:29.805 separate inference endpoint.
466 00:20:29.905 --> 00:20:32.285 Um, especially as models start to scale to really large,
467 00:20:32.425 --> 00:20:36.565 any LLM naturally needs, um, its own inputs provider.
468 00:20:37.065 --> 00:20:41.085 Um, and you see the kind of client AI application here
469 00:20:41.085 --> 00:20:43.605 where it could be a user's front browser,
470 00:20:43.865 --> 00:20:45.285 it could be another backend service.
471 00:20:45.865 --> 00:20:48.005 Um, all talking with these things in practice
472 00:20:48.005 --> 00:20:50.045 for large enterprises, this ends a pretty,
473 00:20:50.105 --> 00:20:51.485 pretty calm, calm pattern.
474 00:20:52.025 --> 00:20:56.405 Um, and so you'll look at this and it's kind of abstract,
475 00:20:56.425 --> 00:21:00.735 but you'll notice that, well, this naturally applies
476 00:21:00.755 --> 00:21:02.015 for rag systems, right?
477 00:21:02.165 --> 00:21:05.735 Because, you know, if you're taking documents, for example,
478 00:21:05.735 --> 00:21:08.895 maybe it's content from the web, from your, your CMS, right?
479 00:21:08.925 --> 00:21:12.815 Like Contently, um, where they have API, they have web hooks
480 00:21:12.815 --> 00:21:14.975 where you can, you know, update, you know,
481 00:21:15.075 --> 00:21:17.175 and you have changes happening to content
482 00:21:17.355 --> 00:21:18.535 and you wanna embed and,
483 00:21:18.635 --> 00:21:21.695 and, um, reflect in,
484 00:21:21.715 --> 00:21:23.895 in your rag system these changes, right?
485 00:21:24.595 --> 00:21:27.815 Um, well, you kinda have to do it with an API you, if you do
486 00:21:27.815 --> 00:21:29.655 that in batch, you're gonna have a, a, you know,
487 00:21:30.155 --> 00:21:32.575 bad results in, in your retrieval
488 00:21:32.575 --> 00:21:34.575 because you'll have stale on indexed data.
489 00:21:35.115 --> 00:21:38.375 Um, and so everything kinda logically starts to, to
490 00:21:38.435 --> 00:21:42.055 to follow like, oh, actually the, these data patterns are,
491 00:21:42.055 --> 00:21:43.295 are well suited for rag.
492 00:21:43.515 --> 00:21:45.375 Um, and it, it's not an accent,
493 00:21:45.375 --> 00:21:47.255 it's just only thing really different is just
494 00:21:47.255 --> 00:21:49.975 that it's text instead of, um, you know, tabular,
495 00:21:50.005 --> 00:21:51.655 it's still numbers because it's vectors, right?
496 00:21:52.235 --> 00:21:54.615 But, um, you know, it, it is an important part.
497 00:21:54.755 --> 00:21:57.135 And the real, the real key value proposition here is
498 00:21:57.135 --> 00:22:00.375 that feast treats not only the online infrastructure, uh,
499 00:22:00.595 --> 00:22:02.575 as a core priority, but also the offline.
500 00:22:02.635 --> 00:22:05.135 Uh, again, it's fine tuning is a first class citizen.
501 00:22:05.165 --> 00:22:07.815 It's a, it's a, it's, it's really the reason
502 00:22:07.815 --> 00:22:11.895 that feasts was originally built was to, you know,
503 00:22:13.115 --> 00:22:15.255 reduce the kind of training and serving sku.
504 00:22:15.255 --> 00:22:17.495 There's, there's a, there's an old paper called, uh,
505 00:22:18.145 --> 00:22:19.375 about ML ops.
506 00:22:19.475 --> 00:22:22.015 And, and you know, this, this paper talked about the,
507 00:22:22.555 --> 00:22:27.415 the core importance of really the operation side of ml.
508 00:22:27.435 --> 00:22:29.655 And, and it is true for AI engineering as well,
509 00:22:29.655 --> 00:22:31.735 and generative ai, because you have
510 00:22:31.735 --> 00:22:33.215 to still handle all the same problems.
511 00:22:33.275 --> 00:22:34.415 You don't have to train the model often.
512 00:22:34.415 --> 00:22:35.455 Maybe you wanna fine tune it,
513 00:22:35.455 --> 00:22:37.575 but all the other problems are still there.
514 00:22:37.915 --> 00:22:41.575 Um, you know, data lineage, permissions, governance,
515 00:22:41.875 --> 00:22:44.975 you know, um, reconciliation and all this other stuff.
516 00:22:45.155 --> 00:22:47.935 Um, and so, you know, one
517 00:22:47.935 --> 00:22:49.855 of the benefits you get from thesis is registry
518 00:22:49.855 --> 00:22:53.095 where a store is metadata about, um, the kind of data
519 00:22:53.095 --> 00:22:56.535 that you use during, you know, inference and training even.
520 00:22:57.075 --> 00:22:58.895 Um, and so we'll talk a little bit more about
521 00:22:58.895 --> 00:23:00.135 that in, in a second.
522 00:23:00.755 --> 00:23:02.255 Um, cool.
523 00:23:03.845 --> 00:23:06.025 So I wanna give a, a demo today
524 00:23:06.445 --> 00:23:10.265 and I'm gonna talk about, um, beast Vis and Dock Ling.
525 00:23:10.285 --> 00:23:11.545 So Dock Ling's really cool.
526 00:23:11.965 --> 00:23:15.705 Uh, it's, it's, um, you know, it essentially takes a bunch
527 00:23:15.705 --> 00:23:18.905 of different input types of text formats, whether PDF,
528 00:23:19.075 --> 00:23:22.985 PowerPoint, doc X-H-T-M-L, and it transforms it
529 00:23:22.985 --> 00:23:24.785 and embeds it, um, into tokens.
530 00:23:24.845 --> 00:23:26.945 Uh, it embeds it into vectors, um,
531 00:23:27.925 --> 00:23:30.265 and it allows you to then upload it into nobus.
532 00:23:30.805 --> 00:23:32.465 Uh, and so I created the simple diagram
533 00:23:32.555 --> 00:23:35.145 where you could imagine some admin, you know,
534 00:23:35.745 --> 00:23:38.305 probably not an end user chooses to write documents
535 00:23:38.305 --> 00:23:41.785 and ingest 'em into, into, um, a feature store, right?
536 00:23:41.925 --> 00:23:44.225 Um, and we'll just go with this really simply
537 00:23:44.245 --> 00:23:45.485 and do a batch exercise.
538 00:23:46.905 --> 00:23:48.245 You write them into the online store,
539 00:23:48.245 --> 00:23:49.805 and some user's gonna wanna retrieve them
540 00:23:49.825 --> 00:23:51.045 to talk to the docs.
541 00:23:51.385 --> 00:23:54.005 And that's basically it. Um, you know,
542 00:23:54.465 --> 00:23:56.525 that's the entire goal of this demo is
543 00:23:56.525 --> 00:23:57.685 to kind of highlight how that works.
544 00:23:58.735 --> 00:24:00.475 And I just finished it this week, so apologies if
545 00:24:00.475 --> 00:24:01.515 I encounter any bugs.
546 00:24:01.785 --> 00:24:03.635 Bear warning. Um,
547 00:24:04.695 --> 00:24:07.075 but this is kind of core of what Feast does.
548 00:24:07.395 --> 00:24:09.115 Ingest data transforms data and stores it,
549 00:24:09.115 --> 00:24:10.995 and it makes it available for low latency retrie.
550 00:24:11.255 --> 00:24:14.475 Um, you know, anything else is really beyond the scope,
551 00:24:14.535 --> 00:24:16.395 but like, that's, that's what we aim to do.
552 00:24:17.695 --> 00:24:19.515 So let's talk a little bit about some
553 00:24:19.515 --> 00:24:21.155 of the feast constructs that you're gonna get here.
554 00:24:21.855 --> 00:24:24.995 So I wanted to go over like feast objects.
555 00:24:26.495 --> 00:24:28.155 So on the right here, you're gonna see two,
556 00:24:28.255 --> 00:24:29.315 two snippets of code.
557 00:24:29.735 --> 00:24:33.595 The first one are entities, so we call it Chunk ID here.
558 00:24:33.735 --> 00:24:34.795 And there's a value type.
559 00:24:34.865 --> 00:24:37.755 It's a string, you know, it is basically to say, um,
560 00:24:38.095 --> 00:24:42.055 you know, it's, uh, it, it's a feast construct.
561 00:24:42.055 --> 00:24:44.295 And the entity pretty much maps to a primary key
562 00:24:44.295 --> 00:24:45.335 that you're gonna put in a table.
563 00:24:46.195 --> 00:24:49.775 Um, and this document is another primary key
564 00:24:49.775 --> 00:24:50.975 that you're gonna put into a table.
565 00:24:51.275 --> 00:24:53.455 Um, and there's some, some stuff there
566 00:24:53.455 --> 00:24:55.655 that ends up being useful for feast, the description,
567 00:24:56.155 --> 00:24:58.015 the value type, and the joint keys.
568 00:24:58.235 --> 00:25:01.095 Um, 'cause you can have multiple joint keys, uh, in, in one.
569 00:25:01.095 --> 00:25:02.815 And so that's why I have to list. Um,
570 00:25:02.815 --> 00:25:04.495 there's data sources that you declare.
571 00:25:04.585 --> 00:25:05.615 These are like files
572 00:25:05.715 --> 00:25:08.855 and request objects, which is like a CSV or a Parquet file.
573 00:25:09.395 --> 00:25:11.735 And then an API call, a request object allows you
574 00:25:11.735 --> 00:25:13.575 to send like, arbitrary data to, to feast,
575 00:25:13.575 --> 00:25:15.695 and it'll treat it as just like an API call
576 00:25:15.695 --> 00:25:17.735 and allow you to transform it and do stuff with it.
577 00:25:18.515 --> 00:25:21.575 Um, and you'll see the source right here, um,
578 00:25:22.835 --> 00:25:24.215 is this file source, and it's the
579 00:25:24.215 --> 00:25:25.495 parquet format, like I said.
580 00:25:26.315 --> 00:25:27.975 Um, and then there's the request source,
581 00:25:27.975 --> 00:25:30.455 which in this case it's gonna be a PDF, you know, uh,
582 00:25:30.455 --> 00:25:31.895 and you'll see there's PDF bytes.
583 00:25:31.895 --> 00:25:33.975 And so what that means is that we're gonna cast a PDF into
584 00:25:33.985 --> 00:25:36.095 bytes to load it and Python and send that.
585 00:25:36.875 --> 00:25:39.925 Um, and then the file name, which is a string.
586 00:25:40.905 --> 00:25:44.165 And so I, I define all of that metadata so
587 00:25:44.165 --> 00:25:47.805 that I can then define what's like a logical table, uh,
588 00:25:47.805 --> 00:25:49.285 which is the feature view on the right.
589 00:25:49.385 --> 00:25:51.685 And so the feature view is called the Dock Link Example
590 00:25:51.685 --> 00:25:55.125 feature view, very creative name, I know, uh, that variable,
591 00:25:55.425 --> 00:25:57.485 uh, with the name of docking feature view.
592 00:25:57.665 --> 00:25:59.805 And it has the entities list there,
593 00:25:59.915 --> 00:26:01.165 it's just defined as chunk.
594 00:26:02.105 --> 00:26:06.785 And the, the field name is filed name.
595 00:26:06.925 --> 00:26:09.585 So there's a field definition field equivalent
596 00:26:09.585 --> 00:26:12.025 to like a feature, um, feature As,
597 00:26:12.045 --> 00:26:14.105 as language is very common among ML engineers.
598 00:26:14.335 --> 00:26:15.545 It's not as evident
599 00:26:15.645 --> 00:26:17.385 to all other people who don't build models.
600 00:26:17.645 --> 00:26:20.425 But, um, the, the common nomenclature there is features, um,
601 00:26:20.605 --> 00:26:22.985 we call it fields here, just to be explicit that it's a,
602 00:26:23.185 --> 00:26:26.395 a field, that this is a schema for a field
603 00:26:26.425 --> 00:26:28.795 that ultimately maps to a database table, um,
604 00:26:29.455 --> 00:26:30.835 and middle of us, it's a collection.
605 00:26:31.575 --> 00:26:34.915 Um, and here you'll notice that there's this, um,
606 00:26:36.235 --> 00:26:37.455 raw chunk of markdown.
607 00:26:37.515 --> 00:26:41.135 So in docking, you can, you can extract partitions of,
608 00:26:41.155 --> 00:26:44.125 of the text, of the, you know, of the chunks.
609 00:26:44.585 --> 00:26:46.485 And here I'm just extracting 'em as marked down,
610 00:26:46.485 --> 00:26:47.765 and you'll see that code in a second.
611 00:26:48.065 --> 00:26:52.945 Um, and you'll notice that under the vector field,
612 00:26:53.355 --> 00:26:56.425 there are two additional bullions beyond the D type,
613 00:26:56.425 --> 00:27:00.305 which is an array of Float 64, um, which is a vector index.
614 00:27:00.405 --> 00:27:01.625 And you see how it says true.
615 00:27:02.245 --> 00:27:04.265 That's how you convictor vector similarity
616 00:27:04.265 --> 00:27:05.385 search in, in feast.
617 00:27:05.565 --> 00:27:08.905 Um, and, and the vector search metric is co-signed.
618 00:27:08.905 --> 00:27:10.985 So that's distance metric that's used to calculate.
619 00:27:11.205 --> 00:27:12.545 So what I says is a lot
620 00:27:12.545 --> 00:27:13.825 of hard work to make this thing happen.
621 00:27:14.005 --> 00:27:15.345 Uh, but I'm really excited about it
622 00:27:15.345 --> 00:27:16.105 because that means that ML
623 00:27:16.305 --> 00:27:17.625 engineers, they don't have to care.
624 00:27:17.815 --> 00:27:20.825 They, they can just, you know, declare this feature view
625 00:27:20.845 --> 00:27:24.065 and then, you know, tell their software
626 00:27:24.065 --> 00:27:25.265 and who's like, Hey, look, just use this.
627 00:27:25.265 --> 00:27:28.145 This is easy. Um, and then they can serve, you know,
628 00:27:28.595 --> 00:27:31.145 their ML models and, and their LLMs
629 00:27:31.145 --> 00:27:32.465 and customize it to their needs.
630 00:27:32.565 --> 00:27:34.105 Uh, and so that's kind of really the exciting
631 00:27:34.105 --> 00:27:35.905 and powerful part, and you'll see that,
632 00:27:35.905 --> 00:27:38.025 that you also source the data source there,
633 00:27:38.025 --> 00:27:40.145 and that ends up being important for materialization
634 00:27:40.165 --> 00:27:42.265 or actually data ingestion later on.
635 00:27:42.455 --> 00:27:45.385 There's a TTL, um, you know, um,
636 00:27:47.165 --> 00:27:48.385 and that's basically it.
637 00:27:49.425 --> 00:27:51.405 You know, we, we, we have our metadata.
638 00:27:51.945 --> 00:27:54.885 Now I want to talk about how we extend this, which is
639 00:27:54.885 --> 00:27:56.005 how do we do transformations?
640 00:27:56.005 --> 00:27:57.725 And, uh, feast allows
641 00:27:57.785 --> 00:28:01.485 for feature transformation in batch compute engines like
642 00:28:01.485 --> 00:28:04.325 Spark, as I mentioned, streaming compute engines like Spark
643 00:28:04.325 --> 00:28:07.245 Streaming and Flink, and then the API servers, which is,
644 00:28:07.245 --> 00:28:09.485 you know, um, the feast feature server.
645 00:28:09.745 --> 00:28:13.635 Um, and the way that that's done is through a decorator.
646 00:28:14.215 --> 00:28:16.875 And this defines basically the other stuff
647 00:28:16.875 --> 00:28:18.635 that you just saw in the feature view.
648 00:28:20.735 --> 00:28:23.875 And then within the function definition, that's
649 00:28:23.875 --> 00:28:26.675 where you actually define like the, the, the transformation.
650 00:28:26.935 --> 00:28:29.235 Now we're, we're, we're, we're actually gonna revisit and,
651 00:28:29.235 --> 00:28:31.075 and change this a little bit to make it a little bit easier
652 00:28:31.255 --> 00:28:32.795 for, uh, engineers,
653 00:28:32.795 --> 00:28:34.995 but it's, it'll be the same, same sort of structure.
654 00:28:35.005 --> 00:28:38.435 We're reading the dec direct decorator from on-demand
655 00:28:38.435 --> 00:28:39.675 feature view to transform,
656 00:28:39.735 --> 00:28:40.915 and then, like, you know,
657 00:28:40.915 --> 00:28:42.395 everything else stays the same, basically.
658 00:28:42.815 --> 00:28:44.315 Um, and it'll be backwards compatible,
659 00:28:44.375 --> 00:28:47.315 but it, it is meant to provide some, uh, clarity to people.
660 00:28:48.455 --> 00:28:50.275 But here, this is the Dock Lane transformation.
661 00:28:50.275 --> 00:28:53.355 So this is what's, so what happens here is if you send this
662 00:28:53.755 --> 00:28:58.195 function an arbitrary set of PDF bytes, it's going to, um,
663 00:28:59.685 --> 00:29:01.535 extract the text from it and embed it.
664 00:29:02.155 --> 00:29:06.655 And so you'll see that there's this, uh, list
665 00:29:06.655 --> 00:29:08.615 of objects that, you know, you initialize,
666 00:29:08.615 --> 00:29:10.495 and then it, it just depends them.
667 00:29:10.675 --> 00:29:12.375 And, uh, it depends on the document id.
668 00:29:12.915 --> 00:29:17.375 The chunk ID, which we generate is just linearly, uh, chunk,
669 00:29:17.435 --> 00:29:19.375 1, 2, 3, 4 to n Um,
670 00:29:19.955 --> 00:29:22.455 and then the embeddings, you know, each embedding is
671 00:29:22.555 --> 00:29:25.135 of length, like, I think 584 or something, I forget,
672 00:29:25.275 --> 00:29:26.375 or maybe it was 3 84.
673 00:29:26.555 --> 00:29:29.095 Um, again, I forget. Um, and,
674 00:29:29.555 --> 00:29:30.935 and then the actual chunk text.
675 00:29:31.355 --> 00:29:34.415 So all that is, is in there and it's all just declared here.
676 00:29:34.435 --> 00:29:35.615 And so this all, all
677 00:29:40.705 --> 00:29:44.755 this, you know, let's say 50 lines of closure
678 00:29:44.755 --> 00:29:47.475 or so allows you to ship rag with these.
679 00:29:47.545 --> 00:29:50.155 Obviously there's a lot of infrastructure code that has
680 00:29:50.155 --> 00:29:51.595 to be written to deploy this stuff, right?
681 00:29:51.655 --> 00:29:54.435 But once that you empower your ML engineers to really start
682 00:29:54.435 --> 00:29:56.635 to ship rag solutions left and right,
683 00:29:56.975 --> 00:29:59.595 and serve them in production systems, um,
684 00:29:59.975 --> 00:30:01.755 and even scale them, uh, and,
685 00:30:01.755 --> 00:30:03.995 and that, that there's more discussed there.
686 00:30:05.495 --> 00:30:07.515 And so the, the data ingestion, uh,
687 00:30:07.615 --> 00:30:09.795 or document ingestion, uh, it's simple.
688 00:30:09.795 --> 00:30:10.955 There's an API endpoint.
689 00:30:10.955 --> 00:30:12.715 There's a push in right to online store
690 00:30:12.715 --> 00:30:15.515 and materialize, materialize meant for bash, um, you know,
691 00:30:15.945 --> 00:30:19.555 push in right to online department for API where, um,
692 00:30:19.935 --> 00:30:22.595 you know, you, you actually want to hit in live services
693 00:30:22.615 --> 00:30:24.995 and, and materialize lets you take like batch data sets,
694 00:30:24.995 --> 00:30:26.235 like CSVs to do it.
695 00:30:26.815 --> 00:30:31.205 Um, and that's it. So you get that kind of free.
696 00:30:31.745 --> 00:30:34.475 Um, yeah,
697 00:30:35.135 --> 00:30:37.995 and this is the API docs that out of the feature server,
698 00:30:38.095 --> 00:30:40.395 you know, you, you get your open A API docs, uh,
699 00:30:40.395 --> 00:30:41.595 available, which is nice.
700 00:30:41.655 --> 00:30:43.395 Uh, and you see the get online features,
701 00:30:43.395 --> 00:30:46.675 receive online documents, uh, write to online store.
702 00:30:46.675 --> 00:30:49.475 There's a health check, and we recently added this chat ui,
703 00:30:49.575 --> 00:30:52.035 uh, that allows you to kind of like, uh, you know,
704 00:30:52.355 --> 00:30:53.675 ML engineers to quickly get up
705 00:30:53.675 --> 00:30:56.355 and running, writing some rag systems.
706 00:30:56.695 --> 00:31:00.995 Um, and so, uh, I'm gonna go into the demo now
707 00:31:00.995 --> 00:31:02.315 before I get into the roadmap.
708 00:31:02.455 --> 00:31:06.235 So apologies, I'm gonna stop sharing. So let's see.
709 00:31:06.725 --> 00:31:09.635 We're gonna do this live, see if it goes well.
710 00:31:16.235 --> 00:31:18.955 And if not, I'll just share the, the, um,
711 00:31:22.265 --> 00:31:22.925 the, uh,
712 00:31:26.695 --> 00:31:27.315 the notebook.
713 00:31:27.545 --> 00:31:32.255 Okay. So this is, uh, the command line. Can people see it?
714 00:31:32.315 --> 00:31:33.695 No. Or am I sharing?
715 00:31:34.195 --> 00:31:35.575 You can share your terminal now.
716 00:31:36.375 --> 00:31:38.585 Okay. People are seeing my terminal. Yes. Yeah.
717 00:31:38.975 --> 00:31:43.165 Okay, cool. So you'll see this is, um,
718 00:31:44.465 --> 00:31:45.485 the fee structure.
719 00:31:46.315 --> 00:31:48.165 There's some pickle option. Oh, I see.
720 00:31:48.385 --> 00:31:50.605 Um, so there's this,
721 00:31:55.135 --> 00:31:57.625 this feature store YAML file, where here,
722 00:31:59.235 --> 00:32:02.445 it's just gonna have a project name a provider here.
723 00:32:02.445 --> 00:32:04.325 It's gonna run locally using Vis Lane, um,
724 00:32:04.425 --> 00:32:07.965 and the online store, and then betting is 384.
725 00:32:08.185 --> 00:32:09.445 And then next type is flat.
726 00:32:10.065 --> 00:32:13.445 Um, this entity key serialization is implementation deal.
727 00:32:13.445 --> 00:32:16.405 You don't have to work. We, we support authentication OIDC.
728 00:32:17.025 --> 00:32:20.605 Um, and so there's, uh, some of
729 00:32:20.605 --> 00:32:22.125 that stuff you don't have to worry about, it's documented.
730 00:32:22.185 --> 00:32:24.165 So I invite you to, to look at document if you
731 00:32:24.685 --> 00:32:26.565 documentation, if you care, but it's basically
732 00:32:26.565 --> 00:32:28.045 where you define your configurations, right?
733 00:32:29.115 --> 00:32:33.745 Um, and then we can look at this example reboot
734 00:32:33.765 --> 00:32:36.585 to go through the stuff I just mentioned, which is all
735 00:32:36.585 --> 00:32:38.345 of the things I had mentioned here.
736 00:32:38.475 --> 00:32:40.005 We're defining the embedding model,
737 00:32:40.465 --> 00:32:41.885 the maximum number of tokens.
738 00:32:42.545 --> 00:32:46.805 Uh, this is, uh, the tokenize, embedding model,
739 00:32:46.805 --> 00:32:47.845 census transformer.
740 00:32:47.915 --> 00:32:51.485 This Chunker, uh, this is embedding the text.
741 00:32:51.925 --> 00:32:55.525 Actually this is generating some chunk id, uh,
742 00:32:56.005 --> 00:32:57.405 I already walked you through all that stuff.
743 00:32:57.405 --> 00:33:01.255 Again, this is, um, this, those transformations.
744 00:33:01.255 --> 00:33:02.495 And so there are a couple commands
745 00:33:02.495 --> 00:33:03.615 you write in feast to do this.
746 00:33:03.635 --> 00:33:04.975 And you say, feast apply.
747 00:33:05.515 --> 00:33:07.215 Uh, this is gonna register the metadata.
748 00:33:07.215 --> 00:33:08.895 Let's see if it works. Ignore the
749 00:33:08.895 --> 00:33:10.255 warnings, let's pretend that they're fine.
750 00:33:10.955 --> 00:33:13.375 Uh, let's see. This is all Dock link stuff.
751 00:33:13.375 --> 00:33:15.455 So applying changes for project rags.
752 00:33:15.455 --> 00:33:18.455 See, this is actually what we should be used to seeing.
753 00:33:22.125 --> 00:33:24.545 And the receipt infrastructure for Dock Link feature view,
754 00:33:25.485 --> 00:33:26.745 uh, the one we talked about,
755 00:33:26.745 --> 00:33:28.385 this is essentially a batch feature view.
756 00:33:28.525 --> 00:33:32.105 And so, uh, let's see.
757 00:33:33.485 --> 00:33:36.025 So we're gonna look at this test workflow script.
758 00:33:36.715 --> 00:33:37.825 We're just gonna show the demo.
759 00:33:38.005 --> 00:33:41.715 And so what's gonna happen here is I'm going
760 00:33:41.715 --> 00:33:43.155 to read this document data.
761 00:33:43.535 --> 00:33:48.435 I'm gonna actually apply this transform, uh,
762 00:33:50.325 --> 00:33:51.695 feature review, the one
763 00:33:51.695 --> 00:33:53.335 that's actually gonna transform things on the
764 00:33:53.355 --> 00:33:54.895 fly, just as an example.
765 00:33:55.235 --> 00:33:56.975 Um, there's a bug there that we, that the work through.
766 00:33:57.035 --> 00:33:58.535 But, um, that, that's fine.
767 00:33:58.715 --> 00:34:02.655 Um, and I'm gonna log the different types of embeddings
768 00:34:02.655 --> 00:34:04.055 that are materialized
769 00:34:04.075 --> 00:34:06.775 or uploaded to, to this database, to vus.
770 00:34:07.765 --> 00:34:10.665 And then in this one, we're doing the same thing,
771 00:34:10.665 --> 00:34:12.065 except now with the different feature view.
772 00:34:12.065 --> 00:34:16.025 This is the one that's, um, writing the raw text itself.
773 00:34:16.085 --> 00:34:17.185 And what's gonna happen is it's
774 00:34:17.265 --> 00:34:18.385 gonna transform it on the fly.
775 00:34:18.765 --> 00:34:20.425 Uh, so that'll be kind of neat to see.
776 00:34:20.885 --> 00:34:23.145 And then we're gonna ask, uh, a question
777 00:34:23.365 --> 00:34:27.945 and then retrieve online documents for rack top K, uh,
778 00:34:28.445 --> 00:34:29.665 and then we're gonna print it out.
779 00:34:31.845 --> 00:34:34.465 And then there's entity-based retrieval, which is, you know,
780 00:34:34.695 --> 00:34:36.585 showing this part.
781 00:34:36.725 --> 00:34:39.425 Oh, sorry, this is, yeah, this is, uh,
782 00:34:39.855 --> 00:34:41.705 this is then retrieving the same from
783 00:34:42.445 --> 00:34:43.785 the transformed versions,
784 00:34:43.785 --> 00:34:45.745 where you're just gonna send in an query embedding.
785 00:34:45.805 --> 00:34:49.665 Um, yeah. So I'm, I'm doing retrieval of the batch one
786 00:34:50.045 --> 00:34:51.905 and the transform, and you get the same, right?
787 00:34:52.165 --> 00:34:53.625 It is just kind of showing that, that, that
788 00:34:53.625 --> 00:34:55.425 that's a, that they're equivalent.
789 00:34:56.045 --> 00:34:58.545 Um, but one, you get to kind of transform on the fly,
790 00:34:58.545 --> 00:34:59.665 like you would be an API.
791 00:34:59.865 --> 00:35:01.105 'cause that's exactly what you wanna do
792 00:35:01.105 --> 00:35:02.145 in live production settings.
793 00:35:02.965 --> 00:35:05.905 And this one, uh, is again, the entity retrieval.
794 00:35:06.285 --> 00:35:08.465 And so let's give it a try and see if it worked.
795 00:35:08.805 --> 00:35:10.785 Uh, test workflow.
796 00:35:14.545 --> 00:35:16.045 Please don't make me regret it,
797 00:35:18.775 --> 00:35:20.555 but I tested this like an hour ago and it worked.
798 00:35:20.655 --> 00:35:22.755 So let's, uh,
799 00:35:23.205 --> 00:35:26.695 let's hope it's the problems.
800 00:35:26.695 --> 00:35:27.695 It takes a minute. Um,
801 00:35:28.865 --> 00:35:33.085 because, uh, transforming the PDS journey, just yeah,
802 00:35:33.085 --> 00:35:34.645 writing the pre-computer values, okay.
803 00:35:34.705 --> 00:35:37.835 And then transforming p okay, it's doing something.
804 00:35:40.315 --> 00:35:42.235 I should have picked smaller data as the conclusion.
805 00:35:51.345 --> 00:35:53.265 I probably should put a progress bar too.
806 00:35:53.725 --> 00:35:54.725 Not that would be useful.
807 00:35:57.515 --> 00:35:58.565 Yeah, welcome to.
808 00:35:59.035 --> 00:36:00.445 Well, it's awkward that you just have
809 00:36:00.445 --> 00:36:01.885 to wait, you know, until it's done.
810 00:36:02.925 --> 00:36:04.125 I didn't, you know, when I was doing this demo,
811 00:36:04.165 --> 00:36:05.485 I didn't think, I was like, ah, it's working.
812 00:36:05.595 --> 00:36:09.205 Like, you know, Hmm. A little stick figure walking across
813 00:36:09.225 --> 00:36:10.845 the terminal would've been good, I thought.
814 00:36:12.865 --> 00:36:15.165 Uh, but maybe I'm quickly hijack that.
815 00:36:15.225 --> 00:36:16.365 Do people have some question
816 00:36:16.365 --> 00:36:17.565 that they want to ask at the end?
817 00:36:18.065 --> 00:36:20.205 Uh, feel free to ask them as we're directly in the chat.
818 00:36:20.545 --> 00:36:21.545 Um, so we can,
819 00:36:24.775 --> 00:36:25.775 Yeah. Folks have questions.
820 00:36:25.775 --> 00:36:28.785 Do feel free to, to, to ask, uh, happy
821 00:36:28.845 --> 00:36:31.815 to, to maybe the terminal is crop.
822 00:36:31.995 --> 00:36:34.895 No, it's, it is, uh, it's, no, it's, it's there.
823 00:36:35.245 --> 00:36:37.735 It's just, it, it is, it is doing some calculations.
824 00:36:37.755 --> 00:36:39.895 And so, so what's happening is that actually these,
825 00:36:40.035 --> 00:36:42.215 the docking, and what's really cool about docking,
826 00:36:42.255 --> 00:36:44.975 I invite you to, to read more about it, is that, um,
827 00:36:45.245 --> 00:36:47.175 it's doing, it's running computer vision
828 00:36:47.315 --> 00:36:51.215 and, uh, small LLMs, um, uh,
829 00:36:52.355 --> 00:36:53.735 during transformation.
830 00:36:53.755 --> 00:36:56.455 So it's taking this PDF, um, and,
831 00:36:56.455 --> 00:36:58.095 and basically extracting the text, right?
832 00:36:58.095 --> 00:37:00.855 But how does it do that? It, it also extracts graphs and,
833 00:37:00.855 --> 00:37:03.655 and, and adds that as textual metadata.
834 00:37:04.035 --> 00:37:06.735 Um, and so, um, it, it's, it's doing all that.
835 00:37:06.735 --> 00:37:08.415 So it's actually quite computationally expensive,
836 00:37:08.415 --> 00:37:09.895 and it takes a couple minutes, in fact.
837 00:37:10.235 --> 00:37:13.615 Um, so I forgot that it usually takes me a while to do that.
838 00:37:13.675 --> 00:37:15.815 And, uh, um, you know,
839 00:37:15.825 --> 00:37:17.495 we're gonna end up sitting here for five minutes. Uh,
840 00:37:17.835 --> 00:37:21.205 By the way, are you happy with the error that you have
841 00:37:21.225 --> 00:37:23.245 or is it Yes. Okay. Yeah,
842 00:37:23.585 --> 00:37:24.585 It is fine. Um,
843 00:37:24.585 --> 00:37:27.125 token Indic, see, length is no
844 00:37:27.265 --> 00:37:28.405 as long than the best fight.
845 00:37:28.405 --> 00:37:30.665 Yeah, that's, that's not a big deal.
846 00:37:30.925 --> 00:37:32.905 Um, it, it,
847 00:37:32.965 --> 00:37:35.385 it takes nothing away from the, the, the example.
848 00:37:36.045 --> 00:37:38.125 Um, yeah.
849 00:37:38.785 --> 00:37:40.125 And docking is just
850 00:37:40.125 --> 00:37:41.685 because I actually didn't know about it before.
851 00:37:41.925 --> 00:37:43.805 It's like fully open source and then fully
852 00:37:43.805 --> 00:37:44.805 Open sourced. So this
853 00:37:44.805 --> 00:37:47.685 was a project, uh, created by IBM Okay.
854 00:37:47.685 --> 00:37:48.965 Uh, research, uh,
855 00:37:49.225 --> 00:37:51.885 and they recently donated to the LFAI Foundation.
856 00:37:52.195 --> 00:37:55.125 Okay. Um, so it's fully open source, open governance, uh,
857 00:37:55.545 --> 00:37:57.685 you know, and I think, uh, it's really great tool.
858 00:37:57.685 --> 00:38:00.565 We added it to fe the feature servers specifically have
859 00:38:00.565 --> 00:38:04.605 like, um, you know, an open source parsing tool so
860 00:38:04.605 --> 00:38:07.445 that data scientists and ML engineers can,
861 00:38:07.945 --> 00:38:10.525 can be unblocked without having to really deal with, um,
862 00:38:12.185 --> 00:38:14.825 figuring out like, well, how can I take my PDFs, um,
863 00:38:15.525 --> 00:38:17.065 and extract them, right?
864 00:38:17.245 --> 00:38:19.105 Um, we know how to do that with regular text,
865 00:38:19.165 --> 00:38:20.785 and oftentimes that's how data comes.
866 00:38:20.845 --> 00:38:22.225 But that's, that's not the only thing.
867 00:38:23.045 --> 00:38:25.505 And does it support like dif so it supports pdf f
868 00:38:25.505 --> 00:38:27.705 but also does it support like different formats and stuff?
869 00:38:27.845 --> 00:38:30.545 It does. It, it supports a lot of really rich formats.
870 00:38:30.545 --> 00:38:32.225 And again, it's, uh, yeah. So there you go.
871 00:38:32.285 --> 00:38:34.665 Um, so it worked. Um, let's
872 00:38:34.665 --> 00:38:36.445 Go and take you over. I let you take over.
873 00:38:37.115 --> 00:38:38.585 Yeah, thank goodness.
874 00:38:38.645 --> 00:38:40.705 Oh, there's, there is an issue with the retriever,
875 00:38:40.705 --> 00:38:42.025 but that this part, but that's fine.
876 00:38:42.025 --> 00:38:43.105 That's a, that's just a bug.
877 00:38:43.245 --> 00:38:46.865 So here what you'll see is that, um, you know,
878 00:38:47.165 --> 00:38:51.335 the Dock Link features, you know, we passed in the first,
879 00:38:51.595 --> 00:38:54.695 uh, and actually let me, uh, go and look at the code, um,
880 00:38:56.705 --> 00:38:57.805 and I'll show you, right?
881 00:38:58.025 --> 00:39:02.705 Um, the query embedding
882 00:39:02.855 --> 00:39:04.105 that we showed
883 00:39:04.445 --> 00:39:09.445 was, here it
884 00:39:09.605 --> 00:39:10.785 What's the name of this paper?
885 00:39:12.395 --> 00:39:15.705 That's the question we asked, uh,
886 00:39:16.825 --> 00:39:17.885 and references here.
887 00:39:18.025 --> 00:39:20.245 And so they both showed, um, the same thing.
888 00:39:20.585 --> 00:39:22.495 Um, and
889 00:39:22.495 --> 00:39:24.935 because again, this one was like this batch transformed
890 00:39:24.995 --> 00:39:27.295 and just uploaded, and this one was the one
891 00:39:27.295 --> 00:39:28.935 that was transformed on the fly.
892 00:39:29.275 --> 00:39:30.815 And the reason it took so long is
893 00:39:30.815 --> 00:39:33.215 because it was iterating, I think it's, it's,
894 00:39:33.255 --> 00:39:35.175 I think it's 10 PDFs that I was processing.
895 00:39:35.215 --> 00:39:37.055 I should have just done one for this example, so my bad.
896 00:39:37.435 --> 00:39:41.735 Um, but, uh, you know, it went and, uh, chunked them
897 00:39:41.915 --> 00:39:43.135 and, and embedded them.
898 00:39:43.915 --> 00:39:45.135 And this is the example.
899 00:39:45.395 --> 00:39:46.895 Um, you know, we asked the question
900 00:39:46.915 --> 00:39:48.535 and it gives us a whole bunch of references
901 00:39:48.915 --> 00:39:50.335 of like, articles that are named.
902 00:39:50.355 --> 00:39:51.735 And so it works, it does the thing.
903 00:39:52.115 --> 00:39:54.495 Um, now the thing that we're really excited about, uh,
904 00:39:54.645 --> 00:39:58.375 with this, and, and, you know, things that we intend
905 00:39:58.375 --> 00:40:02.015 to enhance is making this, again, just really exceptional,
906 00:40:02.015 --> 00:40:03.975 easy for ML engineers, people
907 00:40:03.995 --> 00:40:06.855 to ship production rag applications that can really scale.
908 00:40:07.355 --> 00:40:09.335 Um, and so I'm, I'm gonna get into that in,
909 00:40:09.355 --> 00:40:10.655 in a little bit more in a second
910 00:40:10.655 --> 00:40:12.175 after I share my screen again.
911 00:40:12.675 --> 00:40:17.305 Um, yes, uh, share here.
912 00:40:21.385 --> 00:40:24.485 Yes. Um, we talked about ingestion
913 00:40:25.105 --> 00:40:26.805 and so the, the roadmap of feast.
914 00:40:26.805 --> 00:40:29.485 What, what are we doing next? More NLP, you know, again,
915 00:40:29.505 --> 00:40:32.085 we want FE to be the go-to framework for AI users
916 00:40:32.225 --> 00:40:33.845 to customize their rag solutions.
917 00:40:34.025 --> 00:40:36.965 And that means investing more in viss, uh, viss, you know,
918 00:40:36.965 --> 00:40:38.965 again, is, is an extraordinary database.
919 00:40:39.265 --> 00:40:41.405 Um, it, it's, it's, uh, you know,
920 00:40:41.545 --> 00:40:43.045 it has a great inline behavior
921 00:40:43.625 --> 00:40:47.885 or local behavior with pie mils light, uh, or mils light.
922 00:40:47.945 --> 00:40:50.645 Uh, that just makes it really easy for end users
923 00:40:50.745 --> 00:40:52.005 to get up and up and running.
924 00:40:52.225 --> 00:40:54.405 Um, and I think that's really important.
925 00:40:54.605 --> 00:40:57.485 A lot of, um, what I found when working, uh,
926 00:40:57.665 --> 00:41:00.365 or leading some of these teams is that, um, if it's,
927 00:41:00.365 --> 00:41:03.165 if there's a lot of friction for data scientists
928 00:41:03.165 --> 00:41:04.485 or end users to get started,
929 00:41:04.715 --> 00:41:06.085 they just, they just don't want to use it.
930 00:41:06.265 --> 00:41:08.125 Um, and so, uh,
931 00:41:08.385 --> 00:41:10.845 in least we've invested a lot into making that experience very good.
932 00:41:10.865 --> 00:41:12.925 And, and Nobus, uh, you know, is one
933 00:41:12.925 --> 00:41:14.405 of our fan frameworks because of that.
934 00:41:14.865 --> 00:41:17.205 Um, because of Nobus light, you know, you don't have
935 00:41:17.205 --> 00:41:19.405 to really think too much about containers
936 00:41:19.585 --> 00:41:21.005 or how to deploy it.
937 00:41:21.005 --> 00:41:23.085 You can just kind of hit PIP install and go.
938 00:41:23.545 --> 00:41:25.365 Um, and so, so that's one
939 00:41:25.365 --> 00:41:26.765 of the things that we really like about it.
940 00:41:26.765 --> 00:41:28.565 And again, we'll continue to invest a lot more in
941 00:41:28.605 --> 00:41:30.405 NLP, um, image support.
942 00:41:30.425 --> 00:41:34.605 So, you know, images are, are, are interesting
943 00:41:34.605 --> 00:41:36.365 because they're actually pretty analogous
944 00:41:36.365 --> 00:41:40.245 or equivalent to, um, to, to nl, to,
945 00:41:40.245 --> 00:41:44.245 to language in the sense that, um, you often want metadata,
946 00:41:44.245 --> 00:41:46.325 and this is one of the things that I think I understated in
947 00:41:46.325 --> 00:41:50.135 this talk, is that, um, feast allows you
948 00:41:50.135 --> 00:41:53.295 to store additional information beyond just the sentences
949 00:41:53.435 --> 00:41:54.575 or the tokens, right?
950 00:41:55.075 --> 00:41:57.215 Um, and it turns out there's a lot of rich structure
951 00:41:57.215 --> 00:41:58.335 that can be optimized for that.
952 00:41:58.335 --> 00:42:01.335 And, and if you actually work in, in recommender systems,
953 00:42:01.355 --> 00:42:04.135 you'll know, and, and like basically you can reduce RAG
954 00:42:04.195 --> 00:42:06.975 to being a, a recommender or ranking retrieving system.
955 00:42:07.875 --> 00:42:10.055 The, and there's a lot of really rich metadata
956 00:42:10.055 --> 00:42:13.175 that can be used to power, um,
957 00:42:14.505 --> 00:42:17.795 this text in addition to just text itself.
958 00:42:18.375 --> 00:42:19.475 And by that I mean, like,
959 00:42:21.615 --> 00:42:24.235 you can do what's basically like a hybrid ranking, right?
960 00:42:24.255 --> 00:42:26.315 And you can rank different, um, pieces
961 00:42:26.415 --> 00:42:28.195 of the text differently.
962 00:42:28.695 --> 00:42:32.595 Um, and so if you have like the title, if you have like the,
963 00:42:32.815 --> 00:42:37.135 the, um, uh, age of the document,
964 00:42:37.465 --> 00:42:39.775 these are all things that you can use to wait differently.
965 00:42:40.155 --> 00:42:44.015 And you can even build a model on top called a re-ran, um,
966 00:42:44.115 --> 00:42:45.495 to fully optimize these things.
967 00:42:45.675 --> 00:42:47.775 Uh, and so that is an area that we do intend
968 00:42:47.775 --> 00:42:49.335 to allow customization for.
969 00:42:49.675 --> 00:42:51.655 Um, and again, so that it can be fine tuned
970 00:42:51.765 --> 00:42:55.015 because you, you need to be able to fine tune it so that in,
971 00:42:55.035 --> 00:42:57.535 in serving, you know, what parameters are
972 00:42:57.595 --> 00:42:58.975 and which pieces of content
973 00:42:59.275 --> 00:43:01.095 and how you wanna structure the data.
974 00:43:01.315 --> 00:43:03.295 And, and that becomes very hard, hardly codified.
975 00:43:03.325 --> 00:43:06.135 It's, it's not the same rag of,
976 00:43:06.235 --> 00:43:09.575 I'm just gonna throw it into, um, into the context and,
977 00:43:09.675 --> 00:43:10.735 and see what works.
978 00:43:10.835 --> 00:43:13.575 Um, it is actually much more optimized systems.
979 00:43:13.795 --> 00:43:17.455 Um, and so I, I think, I think both have to exist
980 00:43:17.455 --> 00:43:19.295 and they both end up having to be very powerful.
981 00:43:19.355 --> 00:43:20.615 And as LMS get better,
982 00:43:20.615 --> 00:43:21.855 certainly that will get a little bit better.
983 00:43:21.855 --> 00:43:26.655 But my, my, my long view is that, um, there's always going
984 00:43:26.655 --> 00:43:28.255 to be a need for fine tuning some of these systems,
985 00:43:28.255 --> 00:43:31.135 especially once you hit real scale where another one
986 00:43:31.135 --> 00:43:33.495 or 2% really makes a big financial impact.
987 00:43:34.115 --> 00:43:38.175 Um, and so, uh, scaling Batch, we, we already support, uh,
988 00:43:38.465 --> 00:43:39.615 spark as an offline store.
989 00:43:39.615 --> 00:43:41.455 I mentioned that. And for batch transformations,
990 00:43:41.675 --> 00:43:44.055 we intend on incorporating Ray, uh,
991 00:43:44.055 --> 00:43:46.175 at a point in the future when other maintainers are pretty
992 00:43:46.175 --> 00:43:47.255 excited about doing that work.
993 00:43:47.755 --> 00:43:49.335 Um, and then latency improvements.
994 00:43:49.495 --> 00:43:52.135 I spent some time optimizing a lot of the computation
995 00:43:52.795 --> 00:43:55.575 and refu latency within Feast, uh, in the past.
996 00:43:55.635 --> 00:43:57.775 And we continue to, to invest in that, um,
997 00:43:57.775 --> 00:44:00.135 because we want feast to really be blazing fast.
998 00:44:00.395 --> 00:44:01.695 Um, you know, I used
999 00:44:01.695 --> 00:44:03.175 to work at a company called Fast for a reason.
1000 00:44:03.675 --> 00:44:07.575 Um, we like fast things. And so thank you.
1001 00:44:08.155 --> 00:44:10.055 Um, there's a Feast Rag blog post
1002 00:44:10.055 --> 00:44:11.335 that talks a little bit more about
1003 00:44:11.365 --> 00:44:13.935 what the value proposition of Feasts supporting RAG
1004 00:44:13.935 --> 00:44:15.535 and why I spent so much time on it.
1005 00:44:15.995 --> 00:44:17.335 Um, you know, I have a background in
1006 00:44:17.375 --> 00:44:18.815 NLP, uh, coincidentally.
1007 00:44:18.815 --> 00:44:20.495 And so that, you know, when I became a Feast maintainer,
1008 00:44:20.495 --> 00:44:22.615 in fact, that was a thing I said a year
1009 00:44:22.615 --> 00:44:23.695 and a half ago that I would do,
1010 00:44:23.715 --> 00:44:25.495 and, you know, almost done it.
1011 00:44:25.555 --> 00:44:27.575 Uh, job's not finished, but we're getting close.
1012 00:44:28.115 --> 00:44:30.615 Um, there's some links to the feast documentation,
1013 00:44:30.615 --> 00:44:33.045 the feast website, GitHub repo with the demo
1014 00:44:33.385 --> 00:44:35.165 and the GitHub repo with the docking demo.
1015 00:44:35.265 --> 00:44:38.765 So there's one that's just focused on Basic Rag with viss,
1016 00:44:39.425 --> 00:44:41.165 and then there's another with, you know, uh,
1017 00:44:41.595 --> 00:44:43.645 this docking demo as well with Viss.
1018 00:44:43.665 --> 00:44:46.285 And so I, I wanna say a big thank you to the, to,
1019 00:44:46.385 --> 00:44:48.965 to Stephanie and the folks here at VIS for inviting me
1020 00:44:48.965 --> 00:44:51.325 to talk and, and share the gospel of feast.
1021 00:44:51.465 --> 00:44:53.925 Um, and this is a, a generated image
1022 00:44:53.925 --> 00:44:55.925 of a robot eating a bunch of rags,
1023 00:44:55.925 --> 00:44:57.365 feasting on rags, if you will.
1024 00:44:57.665 --> 00:44:58.665 Uh, I thought that was funny.
1025 00:45:00.415 --> 00:45:03.165 Thank you very much. And I think it's funny, uh,
1026 00:45:03.695 --> 00:45:04.845 thank you very much for
1027 00:45:04.845 --> 00:45:06.245 very detailed presentation, actually.
1028 00:45:06.545 --> 00:45:08.725 Uh, see, even people are laughing in the chat
1029 00:45:08.725 --> 00:45:10.845 and it wrote in the chat, which means they really laughed.
1030 00:45:11.505 --> 00:45:12.505 So
1031 00:45:13.085 --> 00:45:17.525 I think, Uh, do we have question from people?
1032 00:45:18.065 --> 00:45:19.245 So just quickly,
1033 00:45:19.245 --> 00:45:21.045 otherwise, you mentioned I have one on
1034 00:45:21.045 --> 00:45:22.125 my side, so I'll just start.
1035 00:45:22.745 --> 00:45:25.525 Uh, you mentioned a couple of times, uh, you know,
1036 00:45:25.525 --> 00:45:27.325 like low latency for feast.
1037 00:45:28.105 --> 00:45:30.445 Uh, what is it, what does it mean exactly?
1038 00:45:30.875 --> 00:45:33.085 Like what is, um, the thing you're targeting usually
1039 00:45:33.305 --> 00:45:34.325 you you would say, sorry.
1040 00:45:34.755 --> 00:45:36.125 Yeah, that's, that's a really great question.
1041 00:45:36.245 --> 00:45:37.845 I was imprecise and I should have been more precise,
1042 00:45:37.905 --> 00:45:39.725 but, um, you know, like, I think, so if,
1043 00:45:39.725 --> 00:45:42.045 if you're serving stuff online, um,
1044 00:45:44.895 --> 00:45:47.825 usually to at really high scale you're gonna say, well,
1045 00:45:47.825 --> 00:45:50.585 what's my percentile distribution of latency?
1046 00:45:50.585 --> 00:45:54.185 Right? Exactly. And so, um, typically people focus on P 99,
1047 00:45:54.385 --> 00:45:56.105 'cause P 100, you're gonna have a bad time,
1048 00:45:56.205 --> 00:45:58.985 but you know, like P 99 there, let's optimize for that.
1049 00:45:59.605 --> 00:46:01.025 And depending on the data store
1050 00:46:01.025 --> 00:46:04.425 and indexing strategy, it ends up with different trade offs,
1051 00:46:04.425 --> 00:46:05.665 you know, and, and I think
1052 00:46:05.665 --> 00:46:08.625 for vector similarity search in particular, it's very hard,
1053 00:46:09.085 --> 00:46:12.585 um, because that scales proportional to the number
1054 00:46:12.585 --> 00:46:14.625 of documents being embedded and retrieved.
1055 00:46:14.805 --> 00:46:15.985 Um, and so like
1056 00:46:16.055 --> 00:46:18.585 that there are some bottlenecks that you're gonna ultimately hit.
1057 00:46:18.585 --> 00:46:20.715 So if you want like a hundred docs, well
1058 00:46:20.715 --> 00:46:23.915 that's gonna be a lot harder than like, it, you're gonna,
1059 00:46:23.915 --> 00:46:24.955 you're gonna hit some bottlenecks.
1060 00:46:24.955 --> 00:46:27.515 But for basic entity retrieval, um,
1061 00:46:27.985 --> 00:46:29.195 that we know really well.
1062 00:46:29.375 --> 00:46:32.595 And so things like, you know, a Redis cache or, or,
1063 00:46:32.655 --> 00:46:34.555 or Redis ends up being quite performant
1064 00:46:34.555 --> 00:46:36.795 where you can get P 90 nines of five milliseconds.
1065 00:46:37.135 --> 00:46:38.275 My SQL database, you can get
1066 00:46:38.275 --> 00:46:40.595 around 10 milliseconds, uh, P 99.
1067 00:46:41.175 --> 00:46:42.555 Um, and,
1068 00:46:42.695 --> 00:46:45.315 and you know, there, there are ways to continue to ize that.
1069 00:46:45.335 --> 00:46:47.755 But, but really within the code itself,
1070 00:46:47.985 --> 00:46:50.515 because like some of the things you get bottlenecked
1071 00:46:50.515 --> 00:46:53.635 by just the database, basically you end up like the, the,
1072 00:46:53.635 --> 00:46:54.635 the functional limit is
1073 00:46:54.635 --> 00:46:56.835 how fast can the database retrieve from the database.
1074 00:46:56.835 --> 00:46:58.075 And then you do that.
1075 00:46:58.535 --> 00:47:00.675 And, and the strategy is one,
1076 00:47:00.755 --> 00:47:04.915 having really efficient optimized code that isn't doing too,
1077 00:47:04.935 --> 00:47:07.955 too much in the handling and serialization
1078 00:47:07.955 --> 00:47:10.555 and de serialization and computation of the data on the fly.
1079 00:47:11.215 --> 00:47:14.745 Um, and then two pre-com computing.
1080 00:47:14.845 --> 00:47:17.625 And, and so like a lot of what feast really aims
1081 00:47:17.625 --> 00:47:18.705 to do is pre-compute.
1082 00:47:18.965 --> 00:47:21.185 And what we, what our documentation, uh,
1083 00:47:21.195 --> 00:47:24.545 talks about is pre-com computing is, is the gold standard.
1084 00:47:24.605 --> 00:47:27.265 If you want really good customer experiences, pre-compute
1085 00:47:27.265 --> 00:47:29.685 as much as you can, which is why we invest in the batch.
1086 00:47:30.105 --> 00:47:32.765 So a concrete example, you have a million documents
1087 00:47:32.765 --> 00:47:34.365 that you want to be able to search through.
1088 00:47:35.365 --> 00:47:37.705 You have to batch embed those, you have to embed them
1089 00:47:37.765 --> 00:47:40.865 and upload them online to the degree feasible.
1090 00:47:41.065 --> 00:47:44.105 'cause if you try to do that on the fly every time,
1091 00:47:44.765 --> 00:47:47.105 you'll see what we just experienced right in the terminal
1092 00:47:47.115 --> 00:47:48.745 where it's like, it's gonna take time.
1093 00:47:48.895 --> 00:47:51.145 There's like, you have to, you are bound
1094 00:47:51.445 --> 00:47:53.465 by the calculations that you have to execute.
1095 00:47:53.465 --> 00:47:55.465 Of course, we could have done them, you know, uh,
1096 00:47:55.505 --> 00:47:57.945 concurrently and made things slightly more efficient.
1097 00:47:58.205 --> 00:48:01.065 But the fact is, you have to pay that calculation tax.
1098 00:48:01.445 --> 00:48:06.145 So pay it before your, your, um, your users come to your,
1099 00:48:06.145 --> 00:48:07.705 your application if you can.
1100 00:48:07.725 --> 00:48:09.025 That's not always feasible, right?
1101 00:48:09.025 --> 00:48:11.265 If a user's uploading their own document
1102 00:48:11.285 --> 00:48:12.825 to you, well then you have to wait.
1103 00:48:12.945 --> 00:48:14.225 'cause you have to process it, you have once.
1104 00:48:14.285 --> 00:48:17.785 But if you have old documents that you wanna upload and,
1105 00:48:17.965 --> 00:48:19.985 and make accessible to every user,
1106 00:48:20.135 --> 00:48:21.745 well then you absolutely should have done
1107 00:48:21.745 --> 00:48:23.265 that like way beforehand.
1108 00:48:23.725 --> 00:48:25.825 Uh, and then, and then uploaded it.
1109 00:48:25.965 --> 00:48:29.145 So, um, but again, like at the beast layer,
1110 00:48:29.245 --> 00:48:32.425 we are gonna continue to optimize our code base so that it's
1111 00:48:32.425 --> 00:48:34.105 as lightweight and efficient as possible.
1112 00:48:35.765 --> 00:48:38.075 Thank you. And then I'll follow up with one is like,
1113 00:48:38.075 --> 00:48:40.315 so you used, um, like
1114 00:48:40.315 --> 00:48:42.115 how would you customize like embedding models
1115 00:48:42.175 --> 00:48:43.315 and chunking there directly?
1116 00:48:43.315 --> 00:48:45.635 Do you do it on the like docking level
1117 00:48:45.775 --> 00:48:46.955 or where does it work if you,
1118 00:48:47.135 --> 00:48:49.675 You can, docking has a really, uh, wide breadth of, uh,
1119 00:48:49.815 --> 00:48:51.915 of, uh, ability to actually customize.
1120 00:48:51.915 --> 00:48:54.555 And so like, uh, you know, I don't know the full extent
1121 00:48:54.555 --> 00:48:56.235 of all those parameters in docking,
1122 00:48:56.235 --> 00:48:58.275 but they are documented and available, pun intended.
1123 00:48:58.615 --> 00:49:02.595 Um, and uh, what I'd say is that if you,
1124 00:49:03.775 --> 00:49:06.475 if you're not falling within the subset of like, well,
1125 00:49:06.475 --> 00:49:08.635 my data doesn't fit donly well, that's actually part
1126 00:49:08.635 --> 00:49:12.485 of the point of feast is that whatever text data you have,
1127 00:49:12.545 --> 00:49:15.325 if it's just text, this is what piece is built, is
1128 00:49:15.325 --> 00:49:17.725 that you can choose to do whatever sentence transformer you
1129 00:49:17.725 --> 00:49:20.365 want, whatever PyTorch code you want, um,
1130 00:49:21.165 --> 00:49:24.165 whatever tokenization strategy you want, all
1131 00:49:24.165 --> 00:49:26.605 that can be done because you have the, the toolkit to say,
1132 00:49:26.605 --> 00:49:27.925 well, look, you can serve whatever you want.
1133 00:49:27.945 --> 00:49:30.045 You can execute arbitrary functions.
1134 00:49:30.385 --> 00:49:32.845 Um, I don't recommend calling other APIs within
1135 00:49:32.845 --> 00:49:34.045 feature transformations.
1136 00:49:34.235 --> 00:49:36.005 Some people will do that and that's fine.
1137 00:49:36.425 --> 00:49:38.925 Um, but like that, that, that's when you start to,
1138 00:49:39.545 --> 00:49:41.405 to introduce latency
1139 00:49:41.405 --> 00:49:43.125 and you actually get more complicated systems.
1140 00:49:43.265 --> 00:49:47.125 But, uh, ignoring that detail, um, the, the,
1141 00:49:48.105 --> 00:49:50.805 the flexibility of thesis is that, you know, ML engineers
1142 00:49:51.145 --> 00:49:54.125 who tend to be very rich domain experts in, in how to
1143 00:49:54.875 --> 00:49:58.445 outline these things, um, can, can really kind
1144 00:49:58.445 --> 00:49:59.765 of steer the wheel here.
1145 00:50:00.395 --> 00:50:01.485 Okay. And yeah,
1146 00:50:01.765 --> 00:50:03.605 actually follow up just more like on a Phil
1147 00:50:03.605 --> 00:50:05.005 philosophical level, sorry.
1148 00:50:05.585 --> 00:50:08.445 Uh, so your main users are ML engineers.
1149 00:50:08.515 --> 00:50:10.965 What we, you know, it was very popular a couple
1150 00:50:10.965 --> 00:50:13.245 of years ago when I was myself an ML engineer.
1151 00:50:14.065 --> 00:50:16.085 Do you see, like, how does it work now
1152 00:50:16.085 --> 00:50:18.485 with a new AI engineers, you know, like,
1153 00:50:18.485 --> 00:50:21.765 because they use a lot of, you know, API calls
1154 00:50:21.825 --> 00:50:23.325 and LMS directly and stuff.
1155 00:50:23.325 --> 00:50:25.325 Like how do you see the future with feast
1156 00:50:25.325 --> 00:50:27.005 and AI engineers and human engineers?
1157 00:50:27.555 --> 00:50:28.725 Yeah, that's a really good question.
1158 00:50:28.965 --> 00:50:31.525 I think like we're open to, we're definitely, like, I,
1159 00:50:31.645 --> 00:50:33.765 I love to, to cater to AI engineers.
1160 00:50:33.885 --> 00:50:35.325 I really would. I think, um,
1161 00:50:35.705 --> 00:50:37.685 and like I obviously have to, uh,
1162 00:50:37.695 --> 00:50:40.965 share CHIP'S book AI engineering, uh, and in it, I, I,
1163 00:50:40.965 --> 00:50:43.285 and I talk about this a lot, she mentions that
1164 00:50:44.665 --> 00:50:46.845 AI engineering emerged from ML engineering.
1165 00:50:46.995 --> 00:50:49.245 Yeah. Real thing is you don't have
1166 00:50:49.245 --> 00:50:50.685 to train a foundation model anymore.
1167 00:50:50.745 --> 00:50:52.125 You can just treat it as an end point.
1168 00:50:52.265 --> 00:50:54.125 But all of the, and and she says this in her book,
1169 00:50:54.125 --> 00:50:56.445 and I, I've quoted it in some internal papers I've written,
1170 00:50:57.985 --> 00:50:59.245 all the other problems are still there.
1171 00:50:59.625 --> 00:51:02.605 And Feast isn't about inference. Feast is about the data.
1172 00:51:02.905 --> 00:51:04.605 So all of the feast problems that,
1173 00:51:04.645 --> 00:51:06.085 that we encounter, they exist.
1174 00:51:06.385 --> 00:51:09.565 And other frameworks maybe don't have the 10 years
1175 00:51:09.585 --> 00:51:10.645 of knowledge that we've
1176 00:51:10.645 --> 00:51:13.405 and scars, uh, that we've developed in,
1177 00:51:13.405 --> 00:51:14.805 in building these production systems.
1178 00:51:15.425 --> 00:51:18.005 Um, but it's, uh, a lot
1179 00:51:18.005 --> 00:51:19.365 of those things are out of the box in feast.
1180 00:51:19.365 --> 00:51:22.205 But I think the challenge with Feast is that it's
1181 00:51:22.205 --> 00:51:24.965 so anchored towards ML engineering language
1182 00:51:24.965 --> 00:51:29.165 and jargon that it doesn't necessarily immediately catch
1183 00:51:29.355 --> 00:51:31.605 with like, uh, AI engineers.
1184 00:51:31.625 --> 00:51:33.965 And so, uh, there's the marketing angle like, Hey,
1185 00:51:33.965 --> 00:51:35.085 look, can we bridge that gap?
1186 00:51:35.565 --> 00:51:36.965 I don't know if we ever can,
1187 00:51:37.025 --> 00:51:39.805 but I do hope that, that we're able to continue to build
1188 00:51:39.945 --> 00:51:42.565 and enable those, those folks succeed as well.
1189 00:51:43.145 --> 00:51:46.325 Um, yeah, I definitely, like am welcome to, I think part
1190 00:51:46.325 --> 00:51:48.805 of the, the reason I built out this docking, uh,
1191 00:51:49.425 --> 00:51:53.445 and the demos that we've done is to actually, you know, go
1192 00:51:53.445 --> 00:51:54.805 to AI engineers and say like, Hey, look,
1193 00:51:54.805 --> 00:51:56.565 if you wanna do really, really sophisticated stuff,
1194 00:51:57.145 --> 00:51:58.365 here's the path to it.
1195 00:51:58.785 --> 00:52:03.415 But my, my high conviction bet is actually the amount
1196 00:52:03.415 --> 00:52:05.775 of ML engineers is going to continue to increase
1197 00:52:06.125 --> 00:52:07.735 because of more AI engineers.
1198 00:52:07.895 --> 00:52:10.135 'cause you start to find that like, oh, actually I do wanna,
1199 00:52:10.535 --> 00:52:12.255 I do wanna turn all these cranks
1200 00:52:12.275 --> 00:52:14.815 and I wanna put on all these bells and whistles
1201 00:52:14.815 --> 00:52:17.815 because that 2% now starts to be really valuable
1202 00:52:17.835 --> 00:52:19.975 and AI engineering really does unlock that.
1203 00:52:20.035 --> 00:52:23.725 And so, um, that, that's kind of where my, my view on it is.
1204 00:52:23.725 --> 00:52:25.205 But, but it's, it, there's a lot more
1205 00:52:25.205 --> 00:52:26.885 of a upfront cost you have to pay
1206 00:52:26.885 --> 00:52:29.205 with these things versus just like, you know,
1207 00:52:30.315 --> 00:52:32.045 like calling an end point or something, right?
1208 00:52:32.045 --> 00:52:33.445 Or just like using Pandas
1209 00:52:33.445 --> 00:52:35.685 to dump it into, into something right.
1210 00:52:35.705 --> 00:52:37.525 Or face, right? Like all, all these things.
1211 00:52:37.865 --> 00:52:39.125 And, and so I acknowledge
1212 00:52:39.125 --> 00:52:41.045 that like it is more engineering challenges.
1213 00:52:41.045 --> 00:52:42.565 That said, we do try
1214 00:52:42.565 --> 00:52:44.085 to make it pretty easy for people to get started.
1215 00:52:45.035 --> 00:52:46.445 Cool. Well thank you very much.
1216 00:52:46.785 --> 00:52:48.565 And just to wait, but yeah, hopefully
1217 00:52:48.565 --> 00:52:51.285 otherwise, uh, you still get to have like a lot
1218 00:52:51.285 --> 00:52:54.285 of AI engineers, you know, a lot of people coming, uh,
1219 00:52:54.305 --> 00:52:55.445 to using feast.
1220 00:52:55.705 --> 00:52:58.165 Uh, I was using it myself actually a couple of years ago,
1221 00:52:58.165 --> 00:53:00.205 so it's nice to see that it's still here, uh,
1222 00:53:00.255 --> 00:53:01.405 fully alive and you know,
1223 00:53:02.355 --> 00:53:03.965 Yeah, we're working on it, man. It's, uh,
1224 00:53:03.965 --> 00:53:04.965 Yeah.
1225 00:53:05.825 --> 00:53:08.045 And yeah, again to everyone. So it was recorded.
1226 00:53:08.385 --> 00:53:10.845 Uh, we'll share it, uh, in a couple of days
1227 00:53:10.845 --> 00:53:12.085 after it's been edited and stuff.
1228 00:53:12.085 --> 00:53:13.725 So if you also want to send it to your friends,
1229 00:53:14.505 --> 00:53:15.925 uh, feel free to do so.
1230 00:53:15.925 --> 00:53:17.245 Arthur, it was a wonderful presentation.
1231 00:53:17.265 --> 00:53:18.725 So thank you very much F Francisco
1232 00:53:19.705 --> 00:53:22.645 and hopefully I will get to see you one day
1233 00:53:22.825 --> 00:53:25.330 and hopefully some people will get to use feast as well
1234 00:53:25.425 --> 00:53:26.525 and see how nice it is.
1235 00:53:27.735 --> 00:53:29.085 Thank you. Thank you very much.
1236 00:53:29.215 --> 00:53:31.925 Thank you everyone, and have a lovely morning, afternoon,
1237 00:53:31.945 --> 00:53:33.285 or evening, wherever you are in the world.
1238 00:53:33.665 --> 00:53:34.005 See you.
Meet the Speaker
Join the session for live Q&A with the speaker
Francisco Javier Arceo
Senior Principal Software Engineer
Francisco Javier Arceo has spent over a decade working in AI/ML, software, and fintech at AIG, the Commonwealth Bank of Australia, Goldman Sachs, Fast, Affirm, and Red Hat in roles spanning software, data engineering, credit, fraud, data science, and machine learning. He holds graduate degrees in Economics & Statistics and Data Science & Machine Learning from Columbia University in the City of New York and Clemson University. He is a maintainer for Feast, the open source feature store and a Steering Committee member for Kubeflow, the open source ecosystem of Kubernetes components for AI/ML.