You’re in!
Webinar
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Resources
WEBVTT
1 00:00:03.805 --> 00:00:05.825 So I'm pleased to introduce to the Sessions, smarter RAG
2 00:00:05.825 --> 00:00:07.585 Pipelines with Milvus
3 00:00:07.585 --> 00:00:10.145 and Feast with our guest speaker Francisco today.
4 00:00:10.895 --> 00:00:12.785 He's a senior principal engineer at Red Hat,
5 00:00:12.925 --> 00:00:15.865 having spent over a decade working in AI
6 00:00:15.925 --> 00:00:18.065 and ml, also software, FinTech
7 00:00:18.125 --> 00:00:21.985 and ai, LAIG, the Commonwealth Bank of Australia,
8 00:00:22.335 --> 00:00:25.865 Goldman s Sans, Goldman Sachs, sorry, fast Affirm.
9 00:00:25.925 --> 00:00:28.825 And Red Hat in role, spanning from software
10 00:00:29.125 --> 00:00:30.545 to data engineering, credit, fraud,
11 00:00:30.545 --> 00:00:31.945 data science, and mesh learning.
12 00:00:33.005 --> 00:00:36.225 He holds a graduate degrees in economics and statistics
13 00:00:36.285 --> 00:00:39.505 and data science and mesh learning from Columbia University,
14 00:00:40.285 --> 00:00:43.945 uh, in the City of New York and also Clearstone University.
15 00:00:44.415 --> 00:00:47.465 He's a maintainer for Feast, the Open Source feature store,
16 00:00:47.565 --> 00:00:49.825 and a steering committee, me member for Cube Flow,
17 00:00:50.315 --> 00:00:52.745 which is the open source ecosystem of Kubernetes
18 00:00:52.745 --> 00:00:54.185 for competence for ai.
19 00:00:54.185 --> 00:00:57.905 And ML Francisco. The stage is yours. You may take over.
20 00:01:00.505 --> 00:01:03.405 Hi everybody. Uh, I'm Francisco. Pleasure to meet you.
21 00:01:03.465 --> 00:01:06.005 See you. Um, I'm gonna take off my hat, um,
22 00:01:06.065 --> 00:01:09.885 but just for consistency of my profile photo on the, on, on,
23 00:01:09.905 --> 00:01:12.565 on the webinar, I figured I'd I'd show up with it.
24 00:01:12.985 --> 00:01:15.725 Um, so, uh, today we're gonna talk about, uh,
25 00:01:15.775 --> 00:01:20.045 feast Rag Milby, and I am going to share my screen.
26 00:01:21.715 --> 00:01:25.535 Uh, can folks Yes. In the chat, let me know. Okay, perfect.
27 00:01:25.845 --> 00:01:29.015 Confirmed. Um, great, great.
28 00:01:29.275 --> 00:01:31.535 So let's get the party started.
29 00:01:31.955 --> 00:01:35.095 Um, so, uh, you know, uh,
30 00:01:35.935 --> 00:01:37.525 peace RVIs, that's what we're gonna talk about.
31 00:01:38.585 --> 00:01:42.245 Um, because I thought I'd tell folks a little bit about me.
32 00:01:42.525 --> 00:01:44.085 I, I think it was already covered.
33 00:01:44.345 --> 00:01:46.525 Um, you know, but I wanted to give a little bit of context.
34 00:01:47.205 --> 00:01:51.005 I, I've, you know, led, um, data science, data engineering,
35 00:01:51.165 --> 00:01:52.285 ML infra teams at different
36 00:01:52.485 --> 00:01:53.645 companies over the last 12 plus years.
37 00:01:53.825 --> 00:01:57.165 Um, and somehow I stumbled into maintaining feast.
38 00:01:57.225 --> 00:02:00.325 Uh, um, we shipped it at, at, at,
39 00:02:00.325 --> 00:02:04.045 at a previous company I worked at, um, scaled in production
40 00:02:04.185 --> 00:02:08.165 for, for checkout payments, uh, you know, for credit risk
41 00:02:08.165 --> 00:02:11.925 and fraud models, um, where low latency retrieval is,
42 00:02:11.925 --> 00:02:13.565 is a really, really important part.
43 00:02:13.625 --> 00:02:15.805 And, uh, high resiliency and uptime.
44 00:02:15.905 --> 00:02:19.805 And so, um, I kind of, uh, spent my career building models
45 00:02:19.825 --> 00:02:21.165 and then shipping models and,
46 00:02:21.225 --> 00:02:23.325 and that heavily relies on data.
47 00:02:23.465 --> 00:02:25.245 And so, again, that, that's kind of
48 00:02:25.245 --> 00:02:26.485 how I stumbled into feast.
49 00:02:26.865 --> 00:02:29.245 Um, you know, I, I did things the old way before,
50 00:02:29.305 --> 00:02:32.525 and then eventually we have kind of newer, uh,
51 00:02:32.595 --> 00:02:34.845 more structured way of, of serving models.
52 00:02:35.505 --> 00:02:39.165 Um, I joined Red Hat last year, almost to the day, um,
53 00:02:39.905 --> 00:02:41.325 to work on open source ai.
54 00:02:41.325 --> 00:02:43.685 And I feel very privileged to get to work on Feast
55 00:02:43.705 --> 00:02:46.045 and, you know, um, Q Flow
56 00:02:46.105 --> 00:02:49.205 and other communities, uh, really helping to, you know,
57 00:02:49.715 --> 00:02:53.685 work on making sure that, uh, AI is, is open and,
58 00:02:53.685 --> 00:02:55.725 and using the best, uh, of open source.
59 00:02:56.425 --> 00:02:59.245 Um, I have a wife and two children, and,
60 00:02:59.265 --> 00:03:00.605 and I call New Jersey home.
61 00:03:00.805 --> 00:03:02.845 I took this photo when I was in South Dakota.
62 00:03:02.965 --> 00:03:05.085 I used to live out west. Uh, it was a great time. I love it.
63 00:03:05.425 --> 00:03:09.005 Uh, out in Rapid City near Wyoming. Um, and that's me.
64 00:03:09.545 --> 00:03:11.625 So, let's see.
65 00:03:14.365 --> 00:03:17.225 So I wanted to start with some historical context, right?
66 00:03:18.185 --> 00:03:20.265 RAG is pretty popular, um,
67 00:03:20.805 --> 00:03:23.225 but oftentimes people haven't read the original paper
68 00:03:23.445 --> 00:03:25.025 or aren't aware about the original paper.
69 00:03:25.725 --> 00:03:28.025 And so I thought I'd give that brief history and context.
70 00:03:28.405 --> 00:03:31.385 And so, um, RAG is stands
71 00:03:31.385 --> 00:03:32.785 for Retrieval Augmented Generation.
72 00:03:33.045 --> 00:03:36.705 So the PA paper published in NIPS in 2020, um,
73 00:03:36.725 --> 00:03:38.345 by the Meta AI research team,
74 00:03:38.405 --> 00:03:41.345 or back then it was called Fair Facebook AI Research.
75 00:03:42.005 --> 00:03:46.345 Um, and, uh, uh, it looks like I missed a part
76 00:03:46.345 --> 00:03:48.185 of the finishing this bullet point in the second one.
77 00:03:48.185 --> 00:03:51.205 Anyways, um, the, the,
78 00:03:51.265 --> 00:03:55.605 the architecture talked about, um, you know,
79 00:03:56.395 --> 00:03:57.965 some things that people kind of emit today.
80 00:03:58.115 --> 00:03:59.885 They, they actually had two models at play.
81 00:03:59.885 --> 00:04:02.005 They had what's called the retriever, uh, in,
82 00:04:02.005 --> 00:04:03.285 in the diagram for that.
83 00:04:03.285 --> 00:04:05.885 I took a screenshot from in the paper, um,
84 00:04:06.825 --> 00:04:08.605 and the, um, generator,
85 00:04:08.945 --> 00:04:10.605 and there's a query encoder as a part
86 00:04:10.605 --> 00:04:12.525 of this retriever thing, which is, you know,
87 00:04:12.815 --> 00:04:15.845 we're all pretty familiar with like encoders,
88 00:04:15.845 --> 00:04:17.845 which take like a query or a sentence
89 00:04:17.845 --> 00:04:19.925 and then maps it into a vector, right?
90 00:04:20.025 --> 00:04:21.365 Um, a a set of numbers
91 00:04:21.465 --> 00:04:26.245 and like some varying length, um, which is set
92 00:04:26.345 --> 00:04:27.885 by the whatever model you choose.
93 00:04:28.025 --> 00:04:29.605 Uh, you know, people tend
94 00:04:29.605 --> 00:04:33.005 to arbitrarily set some large number like 584 or something.
95 00:04:33.385 --> 00:04:34.805 Uh, usually a power of two.
96 00:04:34.905 --> 00:04:37.685 But, um, you know, one of the things
97 00:04:37.685 --> 00:04:40.685 that was often missed in the dialogue about this is that
98 00:04:41.715 --> 00:04:43.525 when, when this Seminole paper came out,
99 00:04:43.625 --> 00:04:46.285 it was about the end-to-end back propagation
100 00:04:46.625 --> 00:04:48.085 of the retriever and the generator.
101 00:04:48.345 --> 00:04:50.925 And what does that mean? That means that they took some sort
102 00:04:50.925 --> 00:04:53.565 of model, some, you know, pre-trained weights
103 00:04:53.625 --> 00:04:54.645 and they fine tune them.
104 00:04:55.185 --> 00:04:56.845 Um, and I think that's really important,
105 00:04:57.065 --> 00:04:58.285 uh, grounding layer.
106 00:04:58.525 --> 00:05:01.605 'cause that's really not how people think about, um,
107 00:05:02.525 --> 00:05:04.365 rag in practice today and how people use it.
108 00:05:04.365 --> 00:05:06.325 They mostly think about it from inference perspective.
109 00:05:06.545 --> 00:05:07.725 And, and that makes sense. You know, it,
110 00:05:07.725 --> 00:05:09.005 it's not a criticism, it's just kind
111 00:05:09.005 --> 00:05:10.405 of a, a statement of fact.
112 00:05:11.025 --> 00:05:14.485 Um, and then, you know, so 2020,
113 00:05:14.865 --> 00:05:16.245 that's a, that's a while.
114 00:05:16.505 --> 00:05:20.205 Um, you know, why did it become so popular?
115 00:05:20.385 --> 00:05:23.685 And if you look at, at some data, um, probably
116 00:05:23.685 --> 00:05:27.965 because of chat, GBT, uh, chat, GBT, you know,
117 00:05:28.355 --> 00:05:32.925 disrupted the world, um, in October, 2022, um,
118 00:05:33.355 --> 00:05:35.365 they had in, in their original documentation,
119 00:05:35.365 --> 00:05:36.725 they had suggested using rag
120 00:05:37.225 --> 00:05:41.405 and phrasing, like in context learning wasn't as as common.
121 00:05:41.985 --> 00:05:43.005 Um, and,
122 00:05:43.905 --> 00:05:46.405 but what people found was if you just dump stuff into the
123 00:05:46.405 --> 00:05:50.285 context of an LLM, they work pretty well with some prompt,
124 00:05:50.465 --> 00:05:53.485 uh, instruction, uh, formatting, right?
125 00:05:54.025 --> 00:05:57.285 Um, and if you look at the Google Trends, um,
126 00:05:57.655 --> 00:06:00.945 which I screenshotted both here, you know, again,
127 00:06:01.345 --> 00:06:03.785 December, 2020 is when, when, uh, it was published.
128 00:06:04.005 --> 00:06:07.505 Um, and you see that, like it has pretty much, no, no,
129 00:06:08.595 --> 00:06:10.815 no mention, but you see that, uh,
130 00:06:11.565 --> 00:06:14.895 when chat GBT took off in October of 2020, um,
131 00:06:15.075 --> 00:06:19.255 that's when you see RAC really start to, to be popular,
132 00:06:19.555 --> 00:06:20.975 at least according to Google Trends.
133 00:06:21.555 --> 00:06:22.895 Um, and again, it's
134 00:06:22.895 --> 00:06:24.895 because they had sourced it in their documentation and,
135 00:06:24.915 --> 00:06:28.255 and, you know, I was working on some of this stuff, uh,
136 00:06:28.445 --> 00:06:31.455 back when that happened and, you know, used the open AI demo
137 00:06:31.455 --> 00:06:34.135 that they, you know, showed you how to do it.
138 00:06:34.195 --> 00:06:37.415 Um, and, uh, you know,
139 00:06:37.595 --> 00:06:39.535 it was surprisingly powerful as this.
140 00:06:39.635 --> 00:06:42.295 And, and that's kind of like taken the mind share of, of
141 00:06:42.645 --> 00:06:44.255 what we call AI engineering today.
142 00:06:45.415 --> 00:06:47.715 But again, there's, there's a really critical step
143 00:06:47.715 --> 00:06:48.835 that's like, I feel like it's,
144 00:06:48.835 --> 00:06:52.235 it's missing from the conversation here, which is, uh,
145 00:06:52.785 --> 00:06:54.675 most rag applications are only using inference,
146 00:06:54.675 --> 00:06:57.075 which is great, you know, uh, given that it works.
147 00:06:57.135 --> 00:06:59.475 But it's, it's also, uh, important to note
148 00:06:59.475 --> 00:07:03.395 that there was this whole, whole other story with rag.
149 00:07:03.935 --> 00:07:05.715 Um, and, and I think a lot of it is
150 00:07:05.715 --> 00:07:08.675 because it's very easy to dump
151 00:07:08.895 --> 00:07:12.755 and format, um, data and documents into the context
152 00:07:13.055 --> 00:07:14.835 and do vector similarity search,
153 00:07:15.255 --> 00:07:17.235 but it's actually a lot harder to do fine tuning.
154 00:07:17.495 --> 00:07:19.955 Um, that, that's just kind of a, uh,
155 00:07:20.915 --> 00:07:22.155 I don't think a controversial statement.
156 00:07:24.535 --> 00:07:26.435 So how does RAG work for those who are unfamiliar?
157 00:07:26.635 --> 00:07:28.635 I thought I, you know, we'll go through a, a simple example,
158 00:07:28.855 --> 00:07:31.755 um, which, you know, there are really four core steps,
159 00:07:31.845 --> 00:07:34.755 which is one, you embed data, like maybe take a document,
160 00:07:35.145 --> 00:07:38.315 PDFs, or just like some token, some, some, like a blog.
161 00:07:38.895 --> 00:07:41.035 Um, and you embed it, right?
162 00:07:41.035 --> 00:07:44.235 Again, map it into some vector space of all the sentences.
163 00:07:44.235 --> 00:07:45.555 So, so you find a way to partition
164 00:07:45.735 --> 00:07:49.515 or chunk the, um, the, the text.
165 00:07:49.975 --> 00:07:51.395 And then you take those partitions
166 00:07:51.395 --> 00:07:53.475 and each partition visually you embed,
167 00:07:53.705 --> 00:07:56.675 then you store those embeds with some primary identifier,
168 00:07:57.095 --> 00:07:58.755 um, into some database.
169 00:07:59.255 --> 00:08:02.155 And then, you know, in real time you
170 00:08:03.165 --> 00:08:05.195 embed what's called the user query, which is like,
171 00:08:05.335 --> 00:08:06.515 I'm gonna talk to the chat bot.
172 00:08:06.535 --> 00:08:09.475 I'm gonna say, what's, you know, the,
173 00:08:09.855 --> 00:08:11.315 the capital of the us, right?
174 00:08:11.735 --> 00:08:16.275 Um, and it's going like in real time, you'll also embed
175 00:08:16.275 --> 00:08:18.635 that query into a vector,
176 00:08:18.975 --> 00:08:20.075 and you'll use that vector
177 00:08:20.175 --> 00:08:21.955 to search everything in the database.
178 00:08:22.175 --> 00:08:24.475 And so Vector search and Pine Cone and,
179 00:08:24.615 --> 00:08:28.035 and vis, um, arose to really say, Hey, look,
180 00:08:28.035 --> 00:08:30.355 we actually support vector similarity search.
181 00:08:30.375 --> 00:08:31.635 And, and these things became very,
182 00:08:31.985 --> 00:08:33.155 very prolific and popular.
183 00:08:33.735 --> 00:08:36.075 Um, and, you know, mil, this had been around quite,
184 00:08:36.075 --> 00:08:37.235 quite a bit longer, right?
185 00:08:37.295 --> 00:08:41.715 Uh, and earlier this, this than, uh, really 2020 too, right?
186 00:08:41.895 --> 00:08:44.915 Um, and I think it, it's important to note
187 00:08:44.915 --> 00:08:47.515 that vector similarity search as a construct has been
188 00:08:47.515 --> 00:08:48.595 around for a very long time.
189 00:08:48.855 --> 00:08:53.155 And information retrieval, um, has, has been, you know,
190 00:08:53.155 --> 00:08:57.085 practiced for quite a bit, you know, standard, um, you know,
191 00:08:57.325 --> 00:08:59.605 retrieval and recommender systems have been using this for,
192 00:08:59.605 --> 00:09:01.485 for quite a long time, which my understanding is
193 00:09:01.485 --> 00:09:03.685 that's actually how bu, uh, originally started.
194 00:09:04.345 --> 00:09:07.965 Um, and so, you know, at the end you get this query
195 00:09:07.965 --> 00:09:09.765 and you retrieve it with Vector similarity search,
196 00:09:09.825 --> 00:09:11.685 and then you inject that into the context
197 00:09:11.825 --> 00:09:12.925 and go on your merry way
198 00:09:12.985 --> 00:09:15.725 and have your, your LLM generate some sort
199 00:09:15.725 --> 00:09:16.925 of response and you hope it works.
200 00:09:17.305 --> 00:09:19.725 Um, and in practice it worked pretty well.
201 00:09:19.985 --> 00:09:23.085 Um, so,
202 00:09:25.275 --> 00:09:26.855 so how can Feast help with rag?
203 00:09:27.455 --> 00:09:30.265 I think, um, you know,
204 00:09:32.425 --> 00:09:36.375 feast was really grounded on, in as a feature store,
205 00:09:36.395 --> 00:09:38.295 and a feature store was really aimed at helping
206 00:09:38.875 --> 00:09:41.655 reduce the complexity in taking models,
207 00:09:41.935 --> 00:09:44.575 particularly tabular ones to production.
208 00:09:44.835 --> 00:09:47.215 Um, and it turns out
209 00:09:47.215 --> 00:09:48.735 that the hardest part about chipping models
210 00:09:48.735 --> 00:09:52.095 to production in the, in the tabular predictive ML world,
211 00:09:52.595 --> 00:09:54.535 um, isn't really the model itself.
212 00:09:54.855 --> 00:09:57.255 Actually, when models are small, inference reduces
213 00:09:57.255 --> 00:09:59.095 to really just being a calculator in real time.
214 00:09:59.195 --> 00:10:00.775 And, and that's not hard.
215 00:10:00.805 --> 00:10:02.415 It's actually what's hard is orchestrating data.
216 00:10:02.435 --> 00:10:03.775 And that was those, you know, kind of,
217 00:10:03.815 --> 00:10:05.735 I mentioned the early part of my talk.
218 00:10:05.735 --> 00:10:07.335 What I found at working with these enterprises is
219 00:10:07.335 --> 00:10:12.255 that you have so many disparate databases systems, um,
220 00:10:13.045 --> 00:10:16.175 that finding a way to centralize all of the data
221 00:10:16.175 --> 00:10:20.575 that you have so that you can then, like featurize, um,
222 00:10:20.725 --> 00:10:22.855 that data and then serve it
223 00:10:22.855 --> 00:10:25.295 to a model is actually quite hard, um,
224 00:10:25.365 --> 00:10:27.015 both technically and organizationally.
225 00:10:27.115 --> 00:10:29.695 And so, um, you know, I've worked at places
226 00:10:29.695 --> 00:10:31.375 that implement their own crude forms
227 00:10:31.375 --> 00:10:33.375 and feature stores for most of my career
228 00:10:33.425 --> 00:10:35.015 until I ended up maintaining one.
229 00:10:35.475 --> 00:10:38.855 Um, and, uh, there's a joke I like to tell about, uh,
230 00:10:38.855 --> 00:10:40.455 experience I had when I was at the Commonwealth Bank
231 00:10:40.455 --> 00:10:42.135 of Australia, where I flew to Sydney twice
232 00:10:42.755 --> 00:10:43.775 to get data from somebody.
233 00:10:43.775 --> 00:10:45.175 And it's actually a true story
234 00:10:45.175 --> 00:10:48.735 because it wasn't as obvious
235 00:10:48.795 --> 00:10:50.295 for Trivial to actually get that data.
236 00:10:50.635 --> 00:10:53.655 Um, and so it, it's a real, real problem for enterprise
237 00:10:54.115 --> 00:10:55.855 and Feast aims to solve that, um,
238 00:10:55.955 --> 00:10:59.535 by providing a centralized, you know, platform that, um,
239 00:10:59.895 --> 00:11:02.855 stitches together your existing infrastructure, um,
240 00:11:02.995 --> 00:11:05.255 and enables you to be successful in shipping models
241 00:11:05.255 --> 00:11:08.855 to production with the right patterns, uh, and permission
242 00:11:08.855 --> 00:11:11.975 and governance and, you know, server and everything else.
243 00:11:12.715 --> 00:11:16.415 And so the premise is Feast helps with Rag
244 00:11:16.515 --> 00:11:19.295 by empowering MLEs to do what they do best,
245 00:11:19.345 --> 00:11:21.215 which is harness the power of data, um,
246 00:11:21.655 --> 00:11:22.895 MLEs and data scientists.
247 00:11:22.895 --> 00:11:25.295 There's ambiguity about, you know, what the nuances
248 00:11:25.295 --> 00:11:26.215 between the two are, but
249 00:11:26.295 --> 00:11:27.455 I'll, I'll treat them as equivalent.
250 00:11:28.035 --> 00:11:31.375 Um, and so, you know, with Feasts, it's, it's kind of easier
251 00:11:31.395 --> 00:11:32.735 to ship rag to production.
252 00:11:33.195 --> 00:11:35.575 Uh, feast is battle tested support, uh,
253 00:11:35.805 --> 00:11:37.815 real time batch and streaming data.
254 00:11:38.115 --> 00:11:41.055 As I mentioned before, um, at my last role, you know, we,
255 00:11:41.075 --> 00:11:44.255 we ship streaming, we ship real time, we ship batch data,
256 00:11:44.385 --> 00:11:47.095 batch data sets of inserting sizes,
257 00:11:47.095 --> 00:11:48.535 like 360 million records.
258 00:11:48.915 --> 00:11:51.575 Um, and, you know, fee scaled, I mean, you, you start
259 00:11:51.575 --> 00:11:55.045 to hit, um, you know, really scaling this with the database,
260 00:11:55.065 --> 00:11:57.365 uh, that you're using is really what it reduces to.
261 00:11:57.785 --> 00:12:00.765 Um, but you know, we, we actually had great uptime,
262 00:12:00.985 --> 00:12:02.165 um, using Feast.
263 00:12:02.185 --> 00:12:07.165 And so, um, I think it's, um, it, it, it works.
264 00:12:07.235 --> 00:12:09.485 Yeah. And it's been worked by it, it, it works
265 00:12:09.505 --> 00:12:13.005 and it's used by lots and lots of, uh, great, uh,
266 00:12:13.005 --> 00:12:14.245 and powerful enterprises.
267 00:12:14.245 --> 00:12:15.525 And so we're, we're quite proud of that.
268 00:12:15.705 --> 00:12:16.765 Um, and
269 00:12:16.765 --> 00:12:19.085 so we're a little bit slow on getting Rag fully featured,
270 00:12:19.085 --> 00:12:20.725 but now we're, we're in a good place with it.
271 00:12:21.385 --> 00:12:24.545 Um, and so it's, it's really built
272 00:12:24.545 --> 00:12:26.185 for distributed computing and ingestion.
273 00:12:26.185 --> 00:12:28.265 And I think, you know, I, I mentioned the, the insertion,
274 00:12:28.285 --> 00:12:32.505 but, you know, spark support in Feast, um,
275 00:12:33.085 --> 00:12:34.465 is a really powerful mechanism.
276 00:12:34.725 --> 00:12:37.905 You know, we, uh, it was donated by the ADIAN folks, uh,
277 00:12:37.905 --> 00:12:40.145 so I want to give a big shout out to them, um,
278 00:12:40.325 --> 00:12:41.465 as the offline store.
279 00:12:41.645 --> 00:12:44.745 And, you know, the complication that comes up in,
280 00:12:44.765 --> 00:12:48.025 in generating training data for fine tuning is often, well,
281 00:12:48.025 --> 00:12:49.585 how do you embed like a million
282 00:12:49.905 --> 00:12:51.145 documents for training data, right?
283 00:12:51.485 --> 00:12:53.425 And then you start to use, uh, you know,
284 00:12:53.425 --> 00:12:55.145 like distributed computing frameworks,
285 00:12:55.145 --> 00:12:56.625 particularly like Spark or Ray.
286 00:12:57.125 --> 00:12:59.665 Um, there are others like dask. We also do use aask.
287 00:12:59.665 --> 00:13:02.025 We don't really use it that much for, um,
288 00:13:02.645 --> 00:13:04.305 for the offline store, uh, as much.
289 00:13:04.565 --> 00:13:07.185 Um, but, you know, these frameworks exist and,
290 00:13:07.185 --> 00:13:09.785 and ultimately they, they're, they're, you know, built,
291 00:13:09.925 --> 00:13:10.985 uh, within Feast.
292 00:13:11.965 --> 00:13:14.865 And so we treat fine tuning as a first class citizen
293 00:13:15.205 --> 00:13:18.465 and point in time correctness, joining data, making sure
294 00:13:18.465 --> 00:13:21.305 that you're not what's called, um, you know,
295 00:13:21.805 --> 00:13:22.825 you don't have data leakage,
296 00:13:22.825 --> 00:13:24.265 which is looking at data into the future.
297 00:13:25.645 --> 00:13:28.345 And so, um, these are all mechanisms
298 00:13:28.345 --> 00:13:30.265 of why FET is really helpful with rag
299 00:13:30.285 --> 00:13:31.905 and it, it's fully open source, right?
300 00:13:32.045 --> 00:13:33.385 Um, you know, that's, that's one
301 00:13:33.385 --> 00:13:34.985 of the really great benefits of it.
302 00:13:34.985 --> 00:13:36.945 That's why, you know, um, users tend to,
303 00:13:37.325 --> 00:13:38.385 to adopt feasts just
304 00:13:38.385 --> 00:13:41.145 because they don't wanna send their data, you know, outside,
305 00:13:41.325 --> 00:13:43.385 um, or they just want to control the service.
306 00:13:43.605 --> 00:13:48.535 And so that ends up being really helpful. Let's see.
307 00:13:48.535 --> 00:13:50.535 So Feast in Production, I wanted to kind of go over what,
308 00:13:50.535 --> 00:13:52.415 what the architecture looks like and,
309 00:13:52.475 --> 00:13:56.495 and, um, how things look like in, uh,
310 00:13:57.715 --> 00:14:00.935 in Feast and how that kind of naturally follows with rag.
311 00:14:01.355 --> 00:14:03.335 So in Feast World, there's two things.
312 00:14:04.445 --> 00:14:06.015 There's online infrastructure,
313 00:14:06.305 --> 00:14:07.815 which we call an online store,
314 00:14:07.815 --> 00:14:09.775 which is basically just a database that you'd use in,
315 00:14:09.775 --> 00:14:11.655 in like a consumer facing application
316 00:14:11.725 --> 00:14:13.495 that has high resiliency and high uptime,
317 00:14:13.875 --> 00:14:17.295 and an offline infrastructure, which is like a, a database
318 00:14:17.295 --> 00:14:21.575 that you, you use for, uh, model fine tuning, um, you know,
319 00:14:21.575 --> 00:14:24.535 like an offline warehouse, um, where you know,
320 00:14:24.535 --> 00:14:27.135 you're doing lots of reads, not tons of writes, and,
321 00:14:27.155 --> 00:14:31.255 and you're mostly like querying stuff like in Big Query
322 00:14:31.795 --> 00:14:36.455 or Snowflake or Spark, um, you know, on data that's,
323 00:14:36.475 --> 00:14:39.175 you know, not, not going to be, like, if it goes down,
324 00:14:39.245 --> 00:14:41.255 it's not gonna hurt your customers, basically.
325 00:14:42.135 --> 00:14:45.675 And so if you look at the diagram on the right, we have
326 00:14:45.745 --> 00:14:47.275 what we call a data producer.
327 00:14:47.655 --> 00:14:48.995 And so what a data producer is,
328 00:14:49.015 --> 00:14:50.475 is basically like an application.
329 00:14:51.135 --> 00:14:52.365 Maybe you're a payments company
330 00:14:52.365 --> 00:14:54.365 and you have like an authentication service, right?
331 00:14:54.365 --> 00:14:55.765 Like a, a customer logs in
332 00:14:55.765 --> 00:14:57.605 and you want to keep their track of their session,
333 00:14:58.345 --> 00:15:00.725 how many times they've, they've like logged in,
334 00:15:01.095 --> 00:15:02.925 maybe it's about, um, their payments,
335 00:15:02.925 --> 00:15:04.205 how many payments they've made in the past,
336 00:15:04.305 --> 00:15:05.725 if you're an e-commerce company
337 00:15:05.985 --> 00:15:07.525 or products that they've purchased.
338 00:15:08.145 --> 00:15:12.325 Um, what you would want to do is essentially that data.
339 00:15:13.105 --> 00:15:14.645 You might want to emit events
340 00:15:14.985 --> 00:15:19.345 or write data to an offline store so
341 00:15:19.345 --> 00:15:21.225 that you could go back later and analyze it.
342 00:15:21.225 --> 00:15:23.865 And like I said, without impacting any customer experiences,
343 00:15:23.965 --> 00:15:25.505 um, this, and this is pretty, pretty common, right?
344 00:15:25.505 --> 00:15:28.585 And do, like all business analytics is grounded in like,
345 00:15:28.895 --> 00:15:31.745 some sort of offline analytics store, like click House
346 00:15:31.745 --> 00:15:33.545 or something where it's like, I'm, I'm gonna query this data
347 00:15:33.685 --> 00:15:34.785 and learn something from it.
348 00:15:35.205 --> 00:15:37.425 And then you get to the extent next of like, well,
349 00:15:37.505 --> 00:15:39.945 I have an AI engineer who wants to an ML engineer who wants
350 00:15:39.945 --> 00:15:42.465 to build a model to predict something like, you know, what,
351 00:15:43.005 --> 00:15:46.825 um, what a customer's, you know, um,
352 00:15:47.585 --> 00:15:51.145 LTV is or if they're willing to buy this, this thing, uh,
353 00:15:51.165 --> 00:15:52.625 or build a recommendation engine.
354 00:15:53.885 --> 00:15:56.665 And that's what kind of the diagram in on the very left
355 00:15:56.665 --> 00:15:59.745 where the offline log, C-D-C-E-L-T, this is,
356 00:15:59.745 --> 00:16:02.665 this is about essentially emitting data to an offline store
357 00:16:02.665 --> 00:16:03.905 so that you can then analyze it
358 00:16:04.585 --> 00:16:06.125 and you get into this offline story.
359 00:16:06.125 --> 00:16:08.605 And oftentimes people just emit that to like an S3 bucket.
360 00:16:09.105 --> 00:16:12.485 Um, you know, some, some people will do actually, uh,
361 00:16:12.805 --> 00:16:15.445 emit events to Kinesis or Kafka, um,
362 00:16:15.705 --> 00:16:16.925 and, you know, those end up,
363 00:16:16.925 --> 00:16:18.325 you can use an S3 bucket as well.
364 00:16:18.545 --> 00:16:21.725 Um, CDC has changed data capture.
365 00:16:21.825 --> 00:16:25.085 And so if you have like, um, you know, ELT systems,
366 00:16:25.115 --> 00:16:26.725 this tends to be a pretty, pretty common thing.
367 00:16:26.725 --> 00:16:27.725 Five trans, an example
368 00:16:27.745 --> 00:16:29.045 of a provider that does that pretty well.
369 00:16:29.195 --> 00:16:30.525 There's Air, air Byte
370 00:16:30.525 --> 00:16:31.805 that I believe is the open source version.
371 00:16:32.505 --> 00:16:37.125 Um, and so from that, like emitted log data, usually that's
372 00:16:37.125 --> 00:16:38.485 where you generate training data sets
373 00:16:39.355 --> 00:16:43.575 and feast can nicely couple with things like Spark
374 00:16:43.595 --> 00:16:46.255 or Snowflake or whomever your offline store is
375 00:16:46.515 --> 00:16:48.615 and help you create training data sets.
376 00:16:49.075 --> 00:16:51.015 Um, that that's really what, what it's kind of
377 00:16:51.635 --> 00:16:53.335 big value proposition is there, you know,
378 00:16:53.335 --> 00:16:55.135 it has data preparation model training
379 00:16:55.155 --> 00:16:57.895 and back testing that, that you might wanna do within
380 00:16:57.895 --> 00:17:00.215 that like, batch world where it's, you know,
381 00:17:00.735 --> 00:17:04.695 a large computations on lots of, lots of, uh, data sets
382 00:17:04.695 --> 00:17:05.775 that, that will take a bit.
383 00:17:06.355 --> 00:17:08.735 Um, and then there's this.
384 00:17:09.235 --> 00:17:12.215 So, and that's all gonna stay within this data warehouse,
385 00:17:12.215 --> 00:17:15.255 this offline store land, uh, really model exploration.
386 00:17:15.275 --> 00:17:19.015 And in, in the, in the, um, Q flow ecosystem,
387 00:17:19.035 --> 00:17:21.575 we talk about this like the model development lifecycle.
388 00:17:22.545 --> 00:17:24.965 And you'll generally stay in the offline store there.
389 00:17:25.225 --> 00:17:29.245 Um, now moving back up into the, the, the diagram
390 00:17:29.295 --> 00:17:33.325 where we talk about the, the, the event hitting this kind
391 00:17:33.325 --> 00:17:34.605 of streaming application, Flink
392 00:17:34.605 --> 00:17:37.485 or Spark for architectural reasons.
393 00:17:37.715 --> 00:17:41.085 Some, some applications might want to actually
394 00:17:41.785 --> 00:17:43.085 use a streaming architecture
395 00:17:43.085 --> 00:17:44.405 or event driven architecture
396 00:17:44.405 --> 00:17:46.365 where they're just gonna admit events to a Kafka topic,
397 00:17:46.825 --> 00:17:48.645 and then consumers will subscribe to,
398 00:17:49.025 --> 00:17:50.725 or applications would subscribe to that topic
399 00:17:50.825 --> 00:17:51.885 and consume those events
400 00:17:51.885 --> 00:17:54.605 and process them on whatever cadence they they want.
401 00:17:54.625 --> 00:17:56.045 And so, um, clink
402 00:17:56.065 --> 00:17:59.765 and spark streaming are, are naturally, uh, good options in,
403 00:17:59.765 --> 00:18:00.885 in, in that, that case
404 00:18:00.885 --> 00:18:03.405 where you can actually build transformations that,
405 00:18:03.405 --> 00:18:06.645 that manage the throughput, that, that allows you to batch,
406 00:18:06.865 --> 00:18:08.525 uh, requests together and,
407 00:18:08.585 --> 00:18:13.085 and feature computations so that you're not, um, writing
408 00:18:13.145 --> 00:18:16.565 to the online store at too high of a frequency.
409 00:18:16.725 --> 00:18:17.885 'cause then, then you start to get to have
410 00:18:17.885 --> 00:18:19.005 some resource contentions.
411 00:18:19.005 --> 00:18:21.325 And depending if you get lots of volume all at once,
412 00:18:21.465 --> 00:18:23.125 you can actually end up incurring some,
413 00:18:23.125 --> 00:18:24.445 some challenges in production.
414 00:18:24.945 --> 00:18:29.285 Um, and so some data producers, they choose a, uh,
415 00:18:29.805 --> 00:18:32.085 a model where, you know, I have a sidecar,
416 00:18:32.705 --> 00:18:34.565 I'm writing events to an S3 bucket.
417 00:18:34.985 --> 00:18:36.965 Or maybe they wanna just, you know, instead
418 00:18:36.965 --> 00:18:38.365 of a sidecar, use Kafka, right?
419 00:18:38.745 --> 00:18:41.765 And then some people just do like batch dumps, right?
420 00:18:41.765 --> 00:18:43.845 Like every 24 hours, I'm gonna take up copy
421 00:18:43.845 --> 00:18:45.485 of the database and just dump it.
422 00:18:45.945 --> 00:18:49.645 Um, and then you might also want an architecture where
423 00:18:49.675 --> 00:18:52.045 that data producer, that, that, that application
424 00:18:52.585 --> 00:18:54.365 or service, I'm gonna write directly
425 00:18:54.365 --> 00:18:56.125 to the online store via API,
426 00:18:57.065 --> 00:18:59.085 and feast supports all of these architectures.
427 00:18:59.085 --> 00:19:00.285 And there are different trade-offs
428 00:19:00.285 --> 00:19:03.525 with these different right patterns, particularly for, uh,
429 00:19:03.525 --> 00:19:04.725 mission critical services.
430 00:19:05.505 --> 00:19:09.925 Um, like in payments, like in in, in lending, um, where you
431 00:19:10.795 --> 00:19:13.765 have different guarantees about the staleness
432 00:19:14.025 --> 00:19:15.565 or consistency is the language
433 00:19:15.565 --> 00:19:16.845 that's often used of the data.
434 00:19:17.185 --> 00:19:22.045 Um, you know, you, you'll want different, um, right patterns
435 00:19:22.105 --> 00:19:24.325 to the online store for, for different data sources.
436 00:19:24.785 --> 00:19:26.005 Um, 'cause again, they'll,
437 00:19:26.005 --> 00:19:27.285 they'll have different consequences
438 00:19:27.345 --> 00:19:28.485 to your consumer experience.
439 00:19:28.665 --> 00:19:30.845 And the most concrete example is that you want strong,
440 00:19:30.845 --> 00:19:32.725 consistently con consistency.
441 00:19:32.725 --> 00:19:34.765 If you're doing like, um, lending
442 00:19:35.225 --> 00:19:36.445 and you want to check out, uh,
443 00:19:36.445 --> 00:19:37.485 or you want to create a feature
444 00:19:37.485 --> 00:19:40.125 to calculate someone's total exposure, IE
445 00:19:40.125 --> 00:19:41.365 how much money you've lent them,
446 00:19:41.705 --> 00:19:43.125 you don't wanna get that wrong in real time.
447 00:19:43.145 --> 00:19:44.685 You know, you, you wanna make sure that like,
448 00:19:45.115 --> 00:19:47.725 that number is calculated with the most accurate data.
449 00:19:47.865 --> 00:19:49.525 Um, 'cause that can have, uh, pretty severe,
450 00:19:49.625 --> 00:19:50.805 uh, financial consequences.
451 00:19:51.745 --> 00:19:55.125 And then once you kinda hydrate this online store
452 00:19:55.235 --> 00:19:56.845 with the data that you need from all
453 00:19:56.845 --> 00:19:58.845 of your different places, again, batched,
454 00:19:58.885 --> 00:20:03.005 maybe a streaming producer, uh, maybe an online application,
455 00:20:03.505 --> 00:20:04.685 um, you know,
456 00:20:04.705 --> 00:20:07.125 and centralize it into this online store for serving.
457 00:20:07.665 --> 00:20:11.445 Um, then in your AI application, you can,
458 00:20:11.505 --> 00:20:13.885 you can actually talk with your inference provider.
459 00:20:14.015 --> 00:20:16.285 Maybe it's a separate service. Sometimes malls are so small,
460 00:20:16.285 --> 00:20:18.685 you can actually include them in your feature server.
461 00:20:19.105 --> 00:20:20.925 Um, again, for the tabular domain
462 00:20:20.925 --> 00:20:22.965 where models aren't super huge, that
463 00:20:22.965 --> 00:20:24.405 that actually isn't an uncommon pattern.
464 00:20:24.745 --> 00:20:28.085 Uh, but there's lots of utility in having explicit, uh,
465 00:20:28.405 --> 00:20:29.805 separate inference endpoint.
466 00:20:29.905 --> 00:20:32.285 Um, especially as models start to scale to really large,
467 00:20:32.425 --> 00:20:36.565 any LLM naturally needs, um, its own inputs provider.
468 00:20:37.065 --> 00:20:41.085 Um, and you see the kind of client AI application here
469 00:20:41.085 --> 00:20:43.605 where it could be a user's front browser,
470 00:20:43.865 --> 00:20:45.285 it could be another backend service.
471 00:20:45.865 --> 00:20:48.005 Um, all talking with these things in practice
472 00:20:48.005 --> 00:20:50.045 for large enterprises, this ends a pretty,
473 00:20:50.105 --> 00:20:51.485 pretty calm, calm pattern.
474 00:20:52.025 --> 00:20:56.405 Um, and so you'll look at this and it's kind of abstract,
475 00:20:56.425 --> 00:21:00.735 but you'll notice that, well, this naturally applies
476 00:21:00.755 --> 00:21:02.015 for rag systems, right?
477 00:21:02.165 --> 00:21:05.735 Because, you know, if you're taking documents, for example,
478 00:21:05.735 --> 00:21:08.895 maybe it's content from the web, from your, your CMS, right?
479 00:21:08.925 --> 00:21:12.815 Like Contently, um, where they have API, they have web hooks
480 00:21:12.815 --> 00:21:14.975 where you can, you know, update, you know,
481 00:21:15.075 --> 00:21:17.175 and you have changes happening to content
482 00:21:17.355 --> 00:21:18.535 and you wanna embed and,
483 00:21:18.635 --> 00:21:21.695 and, um, reflect in,
484 00:21:21.715 --> 00:21:23.895 in your rag system these changes, right?
485 00:21:24.595 --> 00:21:27.815 Um, well, you kinda have to do it with an API you, if you do
486 00:21:27.815 --> 00:21:29.655 that in batch, you're gonna have a, a, you know,
487 00:21:30.155 --> 00:21:32.575 bad results in, in your retrieval
488 00:21:32.575 --> 00:21:34.575 because you'll have stale on indexed data.
489 00:21:35.115 --> 00:21:38.375 Um, and so everything kinda logically starts to, to
490 00:21:38.435 --> 00:21:42.055 to follow like, oh, actually the, these data patterns are,
491 00:21:42.055 --> 00:21:43.295 are well suited for rag.
492 00:21:43.515 --> 00:21:45.375 Um, and it, it's not an accent,
493 00:21:45.375 --> 00:21:47.255 it's just only thing really different is just
494 00:21:47.255 --> 00:21:49.975 that it's text instead of, um, you know, tabular,
495 00:21:50.005 --> 00:21:51.655 it's still numbers because it's vectors, right?
496 00:21:52.235 --> 00:21:54.615 But, um, you know, it, it is an important part.
497 00:21:54.755 --> 00:21:57.135 And the real, the real key value proposition here is
498 00:21:57.135 --> 00:22:00.375 that feast treats not only the online infrastructure, uh,
499 00:22:00.595 --> 00:22:02.575 as a core priority, but also the offline.
500 00:22:02.635 --> 00:22:05.135 Uh, again, it's fine tuning is a first class citizen.
501 00:22:05.165 --> 00:22:07.815 It's a, it's a, it's, it's really the reason
502 00:22:07.815 --> 00:22:11.895 that feasts was originally built was to, you know,
503 00:22:13.115 --> 00:22:15.255 reduce the kind of training and serving sku.
504 00:22:15.255 --> 00:22:17.495 There's, there's a, there's an old paper called, uh,
505 00:22:18.145 --> 00:22:19.375 about ML ops.
506 00:22:19.475 --> 00:22:22.015 And, and you know, this, this paper talked about the,
507 00:22:22.555 --> 00:22:27.415 the core importance of really the operation side of ml.
508 00:22:27.435 --> 00:22:29.655 And, and it is true for AI engineering as well,
509 00:22:29.655 --> 00:22:31.735 and generative ai, because you have
510 00:22:31.735 --> 00:22:33.215 to still handle all the same problems.
511 00:22:33.275 --> 00:22:34.415 You don't have to train the model often.
512 00:22:34.415 --> 00:22:35.455 Maybe you wanna fine tune it,
513 00:22:35.455 --> 00:22:37.575 but all the other problems are still there.
514 00:22:37.915 --> 00:22:41.575 Um, you know, data lineage, permissions, governance,
515 00:22:41.875 --> 00:22:44.975 you know, um, reconciliation and all this other stuff.
516 00:22:45.155 --> 00:22:47.935 Um, and so, you know, one
517 00:22:47.935 --> 00:22:49.855 of the benefits you get from thesis is registry
518 00:22:49.855 --> 00:22:53.095 where a store is metadata about, um, the kind of data
519 00:22:53.095 --> 00:22:56.535 that you use during, you know, inference and training even.
520 00:22:57.075 --> 00:22:58.895 Um, and so we'll talk a little bit more about
521 00:22:58.895 --> 00:23:00.135 that in, in a second.
522 00:23:00.755 --> 00:23:02.255 Um, cool.
523 00:23:03.845 --> 00:23:06.025 So I wanna give a, a demo today
524 00:23:06.445 --> 00:23:10.265 and I'm gonna talk about, um, beast Vis and Dock Ling.
525 00:23:10.285 --> 00:23:11.545 So Dock Ling's really cool.
526 00:23:11.965 --> 00:23:15.705 Uh, it's, it's, um, you know, it essentially takes a bunch
527 00:23:15.705 --> 00:23:18.905 of different input types of text formats, whether PDF,
528 00:23:19.075 --> 00:23:22.985 PowerPoint, doc X-H-T-M-L, and it transforms it
529 00:23:22.985 --> 00:23:24.785 and embeds it, um, into tokens.
530 00:23:24.845 --> 00:23:26.945 Uh, it embeds it into vectors, um,
531 00:23:27.925 --> 00:23:30.265 and it allows you to then upload it into nobus.
532 00:23:30.805 --> 00:23:32.465 Uh, and so I created the simple diagram
533 00:23:32.555 --> 00:23:35.145 where you could imagine some admin, you know,
534 00:23:35.745 --> 00:23:38.305 probably not an end user chooses to write documents
535 00:23:38.305 --> 00:23:41.785 and ingest 'em into, into, um, a feature store, right?
536 00:23:41.925 --> 00:23:44.225 Um, and we'll just go with this really simply
537 00:23:44.245 --> 00:23:45.485 and do a batch exercise.
538 00:23:46.905 --> 00:23:48.245 You write them into the online store,
539 00:23:48.245 --> 00:23:49.805 and some user's gonna wanna retrieve them
540 00:23:49.825 --> 00:23:51.045 to talk to the docs.
541 00:23:51.385 --> 00:23:54.005 And that's basically it. Um, you know,
542 00:23:54.465 --> 00:23:56.525 that's the entire goal of this demo is
543 00:23:56.525 --> 00:23:57.685 to kind of highlight how that works.
544 00:23:58.735 --> 00:24:00.475 And I just finished it this week, so apologies if
545 00:24:00.475 --> 00:24:01.515 I encounter any bugs.
546 00:24:01.785 --> 00:24:03.635 Bear warning. Um,
547 00:24:04.695 --> 00:24:07.075 but this is kind of core of what Feast does.
548 00:24:07.395 --> 00:24:09.115 Ingest data transforms data and stores it,
549 00:24:09.115 --> 00:24:10.995 and it makes it available for low latency retrie.
550 00:24:11.255 --> 00:24:14.475 Um, you know, anything else is really beyond the scope,
551 00:24:14.535 --> 00:24:16.395 but like, that's, that's what we aim to do.
552 00:24:17.695 --> 00:24:19.515 So let's talk a little bit about some
553 00:24:19.515 --> 00:24:21.155 of the feast constructs that you're gonna get here.
554 00:24:21.855 --> 00:24:24.995 So I wanted to go over like feast objects.
555 00:24:26.495 --> 00:24:28.155 So on the right here, you're gonna see two,
556 00:24:28.255 --> 00:24:29.315 two snippets of code.
557 00:24:29.735 --> 00:24:33.595 The first one are entities, so we call it Chunk ID here.
558 00:24:33.735 --> 00:24:34.795 And there's a value type.
559 00:24:34.865 --> 00:24:37.755 It's a string, you know, it is basically to say, um,
560 00:24:38.095 --> 00:24:42.055 you know, it's, uh, it, it's a feast construct.
561 00:24:42.055 --> 00:24:44.295 And the entity pretty much maps to a primary key
562 00:24:44.295 --> 00:24:45.335 that you're gonna put in a table.
563 00:24:46.195 --> 00:24:49.775 Um, and this document is another primary key
564 00:24:49.775 --> 00:24:50.975 that you're gonna put into a table.
565 00:24:51.275 --> 00:24:53.455 Um, and there's some, some stuff there
566 00:24:53.455 --> 00:24:55.655 that ends up being useful for feast, the description,
567 00:24:56.155 --> 00:24:58.015 the value type, and the joint keys.
568 00:24:58.235 --> 00:25:01.095 Um, 'cause you can have multiple joint keys, uh, in, in one.
569 00:25:01.095 --> 00:25:02.815 And so that's why I have to list. Um,
570 00:25:02.815 --> 00:25:04.495 there's data sources that you declare.
571 00:25:04.585 --> 00:25:05.615 These are like files
572 00:25:05.715 --> 00:25:08.855 and request objects, which is like a CSV or a Parquet file.
573 00:25:09.395 --> 00:25:11.735 And then an API call, a request object allows you
574 00:25:11.735 --> 00:25:13.575 to send like, arbitrary data to, to feast,
575 00:25:13.575 --> 00:25:15.695 and it'll treat it as just like an API call
576 00:25:15.695 --> 00:25:17.735 and allow you to transform it and do stuff with it.
577 00:25:18.515 --> 00:25:21.575 Um, and you'll see the source right here, um,
578 00:25:22.835 --> 00:25:24.215 is this file source, and it's the
579 00:25:24.215 --> 00:25:25.495 parquet format, like I said.
580 00:25:26.315 --> 00:25:27.975 Um, and then there's the request source,
581 00:25:27.975 --> 00:25:30.455 which in this case it's gonna be a PDF, you know, uh,
582 00:25:30.455 --> 00:25:31.895 and you'll see there's PDF bytes.
583 00:25:31.895 --> 00:25:33.975 And so what that means is that we're gonna cast a PDF into
584 00:25:33.985 --> 00:25:36.095 bytes to load it and Python and send that.
585 00:25:36.875 --> 00:25:39.925 Um, and then the file name, which is a string.
586 00:25:40.905 --> 00:25:44.165 And so I, I define all of that metadata so
587 00:25:44.165 --> 00:25:47.805 that I can then define what's like a logical table, uh,
588 00:25:47.805 --> 00:25:49.285 which is the feature view on the right.
589 00:25:49.385 --> 00:25:51.685 And so the feature view is called the Dock Link Example
590 00:25:51.685 --> 00:25:55.125 feature view, very creative name, I know, uh, that variable,
591 00:25:55.425 --> 00:25:57.485 uh, with the name of docking feature view.
592 00:25:57.665 --> 00:25:59.805 And it has the entities list there,
593 00:25:59.915 --> 00:26:01.165 it's just defined as chunk.
594 00:26:02.105 --> 00:26:06.785 And the, the field name is filed name.
595 00:26:06.925 --> 00:26:09.585 So there's a field definition field equivalent
596 00:26:09.585 --> 00:26:12.025 to like a feature, um, feature As,
597 00:26:12.045 --> 00:26:14.105 as language is very common among ML engineers.
598 00:26:14.335 --> 00:26:15.545 It's not as evident
599 00:26:15.645 --> 00:26:17.385 to all other people who don't build models.
600 00:26:17.645 --> 00:26:20.425 But, um, the, the common nomenclature there is features, um,
601 00:26:20.605 --> 00:26:22.985 we call it fields here, just to be explicit that it's a,
602 00:26:23.185 --> 00:26:26.395 a field, that this is a schema for a field
603 00:26:26.425 --> 00:26:28.795 that ultimately maps to a database table, um,
604 00:26:29.455 --> 00:26:30.835 and middle of us, it's a collection.
605 00:26:31.575 --> 00:26:34.915 Um, and here you'll notice that there's this, um,
606 00:26:36.235 --> 00:26:37.455 raw chunk of markdown.
607 00:26:37.515 --> 00:26:41.135 So in docking, you can, you can extract partitions of,
608 00:26:41.155 --> 00:26:44.125 of the text, of the, you know, of the chunks.
609 00:26:44.585 --> 00:26:46.485 And here I'm just extracting 'em as marked down,
610 00:26:46.485 --> 00:26:47.765 and you'll see that code in a second.
611 00:26:48.065 --> 00:26:52.945 Um, and you'll notice that under the vector field,
612 00:26:53.355 --> 00:26:56.425 there are two additional bullions beyond the D type,
613 00:26:56.425 --> 00:27:00.305 which is an array of Float 64, um, which is a vector index.
614 00:27:00.405 --> 00:27:01.625 And you see how it says true.
615 00:27:02.245 --> 00:27:04.265 That's how you convictor vector similarity
616 00:27:04.265 --> 00:27:05.385 search in, in feast.
617 00:27:05.565 --> 00:27:08.905 Um, and, and the vector search metric is co-signed.
618 00:27:08.905 --> 00:27:10.985 So that's distance metric that's used to calculate.
619 00:27:11.205 --> 00:27:12.545 So what I says is a lot
620 00:27:12.545 --> 00:27:13.825 of hard work to make this thing happen.
621 00:27:14.005 --> 00:27:15.345 Uh, but I'm really excited about it
622 00:27:15.345 --> 00:27:16.105 because that means that ML
623 00:27:16.305 --> 00:27:17.625 engineers, they don't have to care.
624 00:27:17.815 --> 00:27:20.825 They, they can just, you know, declare this feature view
625 00:27:20.845 --> 00:27:24.065 and then, you know, tell their software
626 00:27:24.065 --> 00:27:25.265 and who's like, Hey, look, just use this.
627 00:27:25.265 --> 00:27:28.145 This is easy. Um, and then they can serve, you know,
628 00:27:28.595 --> 00:27:31.145 their ML models and, and their LLMs
629 00:27:31.145 --> 00:27:32.465 and customize it to their needs.
630 00:27:32.565 --> 00:27:34.105 Uh, and so that's kind of really the exciting
631 00:27:34.105 --> 00:27:35.905 and powerful part, and you'll see that,
632 00:27:35.905 --> 00:27:38.025 that you also source the data source there,
633 00:27:38.025 --> 00:27:40.145 and that ends up being important for materialization
634 00:27:40.165 --> 00:27:42.265 or actually data ingestion later on.
635 00:27:42.455 --> 00:27:45.385 There's a TTL, um, you know, um,
636 00:27:47.165 --> 00:27:48.385 and that's basically it.
637 00:27:49.425 --> 00:27:51.405 You know, we, we, we have our metadata.
638 00:27:51.945 --> 00:27:54.885 Now I want to talk about how we extend this, which is
639 00:27:54.885 --> 00:27:56.005 how do we do transformations?
640 00:27:56.005 --> 00:27:57.725 And, uh, feast allows
641 00:27:57.785 --> 00:28:01.485 for feature transformation in batch compute engines like
642 00:28:01.485 --> 00:28:04.325 Spark, as I mentioned, streaming compute engines like Spark
643 00:28:04.325 --> 00:28:07.245 Streaming and Flink, and then the API servers, which is,
644 00:28:07.245 --> 00:28:09.485 you know, um, the feast feature server.
645 00:28:09.745 --> 00:28:13.635 Um, and the way that that's done is through a decorator.
646 00:28:14.215 --> 00:28:16.875 And this defines basically the other stuff
647 00:28:16.875 --> 00:28:18.635 that you just saw in the feature view.
648 00:28:20.735 --> 00:28:23.875 And then within the function definition, that's
649 00:28:23.875 --> 00:28:26.675 where you actually define like the, the, the transformation.
650 00:28:26.935 --> 00:28:29.235 Now we're, we're, we're, we're actually gonna revisit and,
651 00:28:29.235 --> 00:28:31.075 and change this a little bit to make it a little bit easier
652 00:28:31.255 --> 00:28:32.795 for, uh, engineers,
653 00:28:32.795 --> 00:28:34.995 but it's, it'll be the same, same sort of structure.
654 00:28:35.005 --> 00:28:38.435 We're reading the dec direct decorator from on-demand
655 00:28:38.435 --> 00:28:39.675 feature view to transform,
656 00:28:39.735 --> 00:28:40.915 and then, like, you know,
657 00:28:40.915 --> 00:28:42.395 everything else stays the same, basically.
658 00:28:42.815 --> 00:28:44.315 Um, and it'll be backwards compatible,
659 00:28:44.375 --> 00:28:47.315 but it, it is meant to provide some, uh, clarity to people.
660 00:28:48.455 --> 00:28:50.275 But here, this is the Dock Lane transformation.
661 00:28:50.275 --> 00:28:53.355 So this is what's, so what happens here is if you send this
662 00:28:53.755 --> 00:28:58.195 function an arbitrary set of PDF bytes, it's going to, um,
663 00:28:59.685 --> 00:29:01.535 extract the text from it and embed it.
664 00:29:02.155 --> 00:29:06.655 And so you'll see that there's this, uh, list
665 00:29:06.655 --> 00:29:08.615 of objects that, you know, you initialize,
666 00:29:08.615 --> 00:29:10.495 and then it, it just depends them.
667 00:29:10.675 --> 00:29:12.375 And, uh, it depends on the document id.
668 00:29:12.915 --> 00:29:17.375 The chunk ID, which we generate is just linearly, uh, chunk,
669 00:29:17.435 --> 00:29:19.375 1, 2, 3, 4 to n Um,
670 00:29:19.955 --> 00:29:22.455 and then the embeddings, you know, each embedding is
671 00:29:22.555 --> 00:29:25.135 of length, like, I think 584 or something, I forget,
672 00:29:25.275 --> 00:29:26.375 or maybe it was 3 84.
673 00:29:26.555 --> 00:29:29.095 Um, again, I forget. Um, and,
674 00:29:29.555 --> 00:29:30.935 and then the actual chunk text.
675 00:29:31.355 --> 00:29:34.415 So all that is, is in there and it's all just declared here.
676 00:29:34.435 --> 00:29:35.615 And so this all, all
677 00:29:40.705 --> 00:29:44.755 this, you know, let's say 50 lines of closure
678 00:29:44.755 --> 00:29:47.475 or so allows you to ship rag with these.
679 00:29:47.545 --> 00:29:50.155 Obviously there's a lot of infrastructure code that has
680 00:29:50.155 --> 00:29:51.595 to be written to deploy this stuff, right?
681 00:29:51.655 --> 00:29:54.435 But once that you empower your ML engineers to really start
682 00:29:54.435 --> 00:29:56.635 to ship rag solutions left and right,
683 00:29:56.975 --> 00:29:59.595 and serve them in production systems, um,
684 00:29:59.975 --> 00:30:01.755 and even scale them, uh, and,
685 00:30:01.755 --> 00:30:03.995 and that, that there's more discussed there.
686 00:30:05.495 --> 00:30:07.515 And so the, the data ingestion, uh,
687 00:30:07.615 --> 00:30:09.795 or document ingestion, uh, it's simple.
688 00:30:09.795 --> 00:30:10.955 There's an API endpoint.
689 00:30:10.955 --> 00:30:12.715 There's a push in right to online store
690 00:30:12.715 --> 00:30:15.515 and materialize, materialize meant for bash, um, you know,
691 00:30:15.945 --> 00:30:19.555 push in right to online department for API where, um,
692 00:30:19.935 --> 00:30:22.595 you know, you, you actually want to hit in live services
693 00:30:22.615 --> 00:30:24.995 and, and materialize lets you take like batch data sets,
694 00:30:24.995 --> 00:30:26.235 like CSVs to do it.
695 00:30:26.815 --> 00:30:31.205 Um, and that's it. So you get that kind of free.
696 00:30:31.745 --> 00:30:34.475 Um, yeah,
697 00:30:35.135 --> 00:30:37.995 and this is the API docs that out of the feature server,
698 00:30:38.095 --> 00:30:40.395 you know, you, you get your open A API docs, uh,
699 00:30:40.395 --> 00:30:41.595 available, which is nice.
700 00:30:41.655 --> 00:30:43.395 Uh, and you see the get online features,
701 00:30:43.395 --> 00:30:46.675 receive online documents, uh, write to online store.
702 00:30:46.675 --> 00:30:49.475 There's a health check, and we recently added this chat ui,
703 00:30:49.575 --> 00:30:52.035 uh, that allows you to kind of like, uh, you know,
704 00:30:52.355 --> 00:30:53.675 ML engineers to quickly get up
705 00:30:53.675 --> 00:30:56.355 and running, writing some rag systems.
706 00:30:56.695 --> 00:31:00.995 Um, and so, uh, I'm gonna go into the demo now
707 00:31:00.995 --> 00:31:02.315 before I get into the roadmap.
708 00:31:02.455 --> 00:31:06.235 So apologies, I'm gonna stop sharing. So let's see.
709 00:31:06.725 --> 00:31:09.635 We're gonna do this live, see if it goes well.
710 00:31:16.235 --> 00:31:18.955 And if not, I'll just share the, the, um,
711 00:31:22.265 --> 00:31:22.925 the, uh,
712 00:31:26.695 --> 00:31:27.315 the notebook.
713 00:31:27.545 --> 00:31:32.255 Okay. So this is, uh, the command line. Can people see it?
714 00:31:32.315 --> 00:31:33.695 No. Or am I sharing?
715 00:31:34.195 --> 00:31:35.575 You can share your terminal now.
716 00:31:36.375 --> 00:31:38.585 Okay. People are seeing my terminal. Yes. Yeah.
717 00:31:38.975 --> 00:31:43.165 Okay, cool. So you'll see this is, um,
718 00:31:44.465 --> 00:31:45.485 the fee structure.
719 00:31:46.315 --> 00:31:48.165 There's some pickle option. Oh, I see.
720 00:31:48.385 --> 00:31:50.605 Um, so there's this,
721 00:31:55.135 --> 00:31:57.625 this feature store YAML file, where here,
722 00:31:59.235 --> 00:32:02.445 it's just gonna have a project name a provider here.
723 00:32:02.445 --> 00:32:04.325 It's gonna run locally using Vis Lane, um,
724 00:32:04.425 --> 00:32:07.965 and the online store, and then betting is 384.
725 00:32:08.185 --> 00:32:09.445 And then next type is flat.
726 00:32:10.065 --> 00:32:13.445 Um, this entity key serialization is implementation deal.
727 00:32:13.445 --> 00:32:16.405 You don't have to work. We, we support authentication OIDC.
728 00:32:17.025 --> 00:32:20.605 Um, and so there's, uh, some of
729 00:32:20.605 --> 00:32:22.125 that stuff you don't have to worry about, it's documented.
730 00:32:22.185 --> 00:32:24.165 So I invite you to, to look at document if you
731 00:32:24.685 --> 00:32:26.565 documentation, if you care, but it's basically
732 00:32:26.565 --> 00:32:28.045 where you define your configurations, right?
733 00:32:29.115 --> 00:32:33.745 Um, and then we can look at this example reboot
734 00:32:33.765 --> 00:32:36.585 to go through the stuff I just mentioned, which is all
735 00:32:36.585 --> 00:32:38.345 of the things I had mentioned here.
736 00:32:38.475 --> 00:32:40.005 We're defining the embedding model,
737 00:32:40.465 --> 00:32:41.885 the maximum number of tokens.
738 00:32:42.545 --> 00:32:46.805 Uh, this is, uh, the tokenize, embedding model,
739 00:32:46.805 --> 00:32:47.845 census transformer.
740 00:32:47.915 --> 00:32:51.485 This Chunker, uh, this is embedding the text.
741 00:32:51.925 --> 00:32:55.525 Actually this is generating some chunk id, uh,
742 00:32:56.005 --> 00:32:57.405 I already walked you through all that stuff.
743 00:32:57.405 --> 00:33:01.255 Again, this is, um, this, those transformations.
744 00:33:01.255 --> 00:33:02.495 And so there are a couple commands
745 00:33:02.495 --> 00:33:03.615 you write in feast to do this.
746 00:33:03.635 --> 00:33:04.975 And you say, feast apply.
747 00:33:05.515 --> 00:33:07.215 Uh, this is gonna register the metadata.
748 00:33:07.215 --> 00:33:08.895 Let's see if it works. Ignore the
749 00:33:08.895 --> 00:33:10.255 warnings, let's pretend that they're fine.
750 00:33:10.955 --> 00:33:13.375 Uh, let's see. This is all Dock link stuff.
751 00:33:13.375 --> 00:33:15.455 So applying changes for project rags.
752 00:33:15.455 --> 00:33:18.455 See, this is actually what we should be used to seeing.
753 00:33:22.125 --> 00:33:24.545 And the receipt infrastructure for Dock Link feature view,
754 00:33:25.485 --> 00:33:26.745 uh, the one we talked about,
755 00:33:26.745 --> 00:33:28.385 this is essentially a batch feature view.
756 00:33:28.525 --> 00:33:32.105 And so, uh, let's see.
757 00:33:33.485 --> 00:33:36.025 So we're gonna look at this test workflow script.
758 00:33:36.715 --> 00:33:37.825 We're just gonna show the demo.
759 00:33:38.005 --> 00:33:41.715 And so what's gonna happen here is I'm going
760 00:33:41.715 --> 00:33:43.155 to read this document data.
761 00:33:43.535 --> 00:33:48.435 I'm gonna actually apply this transform, uh,
762 00:33:50.325 --> 00:33:51.695 feature review, the one
763 00:33:51.695 --> 00:33:53.335 that's actually gonna transform things on the
764 00:33:53.355 --> 00:33:54.895 fly, just as an example.
765 00:33:55.235 --> 00:33:56.975 Um, there's a bug there that we, that the work through.
766 00:33:57.035 --> 00:33:58.535 But, um, that, that's fine.
767 00:33:58.715 --> 00:34:02.655 Um, and I'm gonna log the different types of embeddings
768 00:34:02.655 --> 00:34:04.055 that are materialized
769 00:34:04.075 --> 00:34:06.775 or uploaded to, to this database, to vus.
770 00:34:07.765 --> 00:34:10.665 And then in this one, we're doing the same thing,
771 00:34:10.665 --> 00:34:12.065 except now with the different feature view.
772 00:34:12.065 --> 00:34:16.025 This is the one that's, um, writing the raw text itself.
773 00:34:16.085 --> 00:34:17.185 And what's gonna happen is it's
774 00:34:17.265 --> 00:34:18.385 gonna transform it on the fly.
775 00:34:18.765 --> 00:34:20.425 Uh, so that'll be kind of neat to see.
776 00:34:20.885 --> 00:34:23.145 And then we're gonna ask, uh, a question
777 00:34:23.365 --> 00:34:27.945 and then retrieve online documents for rack top K, uh,
778 00:34:28.445 --> 00:34:29.665 and then we're gonna print it out.
779 00:34:31.845 --> 00:34:34.465 And then there's entity-based retrieval, which is, you know,
780 00:34:34.695 --> 00:34:36.585 showing this part.
781 00:34:36.725 --> 00:34:39.425 Oh, sorry, this is, yeah, this is, uh,
782 00:34:39.855 --> 00:34:41.705 this is then retrieving the same from
783 00:34:42.445 --> 00:34:43.785 the transformed versions,
784 00:34:43.785 --> 00:34:45.745 where you're just gonna send in an query embedding.
785 00:34:45.805 --> 00:34:49.665 Um, yeah. So I'm, I'm doing retrieval of the batch one
786 00:34:50.045 --> 00:34:51.905 and the transform, and you get the same, right?
787 00:34:52.165 --> 00:34:53.625 It is just kind of showing that, that, that
788 00:34:53.625 --> 00:34:55.425 that's a, that they're equivalent.
789 00:34:56.045 --> 00:34:58.545 Um, but one, you get to kind of transform on the fly,
790 00:34:58.545 --> 00:34:59.665 like you would be an API.
791 00:34:59.865 --> 00:35:01.105 'cause that's exactly what you wanna do
792 00:35:01.105 --> 00:35:02.145 in live production settings.
793 00:35:02.965 --> 00:35:05.905 And this one, uh, is again, the entity retrieval.
794 00:35:06.285 --> 00:35:08.465 And so let's give it a try and see if it worked.
795 00:35:08.805 --> 00:35:10.785 Uh, test workflow.
796 00:35:14.545 --> 00:35:16.045 Please don't make me regret it,
797 00:35:18.775 --> 00:35:20.555 but I tested this like an hour ago and it worked.
798 00:35:20.655 --> 00:35:22.755 So let's, uh,
799 00:35:23.205 --> 00:35:26.695 let's hope it's the problems.
800 00:35:26.695 --> 00:35:27.695 It takes a minute. Um,
801 00:35:28.865 --> 00:35:33.085 because, uh, transforming the PDS journey, just yeah,
802 00:35:33.085 --> 00:35:34.645 writing the pre-computer values, okay.
803 00:35:34.705 --> 00:35:37.835 And then transforming p okay, it's doing something.
804 00:35:40.315 --> 00:35:42.235 I should have picked smaller data as the conclusion.
805 00:35:51.345 --> 00:35:53.265 I probably should put a progress bar too.
806 00:35:53.725 --> 00:35:54.725 Not that would be useful.
807 00:35:57.515 --> 00:35:58.565 Yeah, welcome to.
808 00:35:59.035 --> 00:36:00.445 Well, it's awkward that you just have
809 00:36:00.445 --> 00:36:01.885 to wait, you know, until it's done.
810 00:36:02.925 --> 00:36:04.125 I didn't, you know, when I was doing this demo,
811 00:36:04.165 --> 00:36:05.485 I didn't think, I was like, ah, it's working.
812 00:36:05.595 --> 00:36:09.205 Like, you know, Hmm. A little stick figure walking across
813 00:36:09.225 --> 00:36:10.845 the terminal would've been good, I thought.
814 00:36:12.865 --> 00:36:15.165 Uh, but maybe I'm quickly hijack that.
815 00:36:15.225 --> 00:36:16.365 Do people have some question
816 00:36:16.365 --> 00:36:17.565 that they want to ask at the end?
817 00:36:18.065 --> 00:36:20.205 Uh, feel free to ask them as we're directly in the chat.
818 00:36:20.545 --> 00:36:21.545 Um, so we can,
819 00:36:24.775 --> 00:36:25.775 Yeah. Folks have questions.
820 00:36:25.775 --> 00:36:28.785 Do feel free to, to, to ask, uh, happy
821 00:36:28.845 --> 00:36:31.815 to, to maybe the terminal is crop.
822 00:36:31.995 --> 00:36:34.895 No, it's, it is, uh, it's, no, it's, it's there.
823 00:36:35.245 --> 00:36:37.735 It's just, it, it is, it is doing some calculations.
824 00:36:37.755 --> 00:36:39.895 And so, so what's happening is that actually these,
825 00:36:40.035 --> 00:36:42.215 the docking, and what's really cool about docking,
826 00:36:42.255 --> 00:36:44.975 I invite you to, to read more about it, is that, um,
827 00:36:45.245 --> 00:36:47.175 it's doing, it's running computer vision
828 00:36:47.315 --> 00:36:51.215 and, uh, small LLMs, um, uh,
829 00:36:52.355 --> 00:36:53.735 during transformation.
830 00:36:53.755 --> 00:36:56.455 So it's taking this PDF, um, and,
831 00:36:56.455 --> 00:36:58.095 and basically extracting the text, right?
832 00:36:58.095 --> 00:37:00.855 But how does it do that? It, it also extracts graphs and,
833 00:37:00.855 --> 00:37:03.655 and, and adds that as textual metadata.
834 00:37:04.035 --> 00:37:06.735 Um, and so, um, it, it's, it's doing all that.
835 00:37:06.735 --> 00:37:08.415 So it's actually quite computationally expensive,
836 00:37:08.415 --> 00:37:09.895 and it takes a couple minutes, in fact.
837 00:37:10.235 --> 00:37:13.615 Um, so I forgot that it usually takes me a while to do that.
838 00:37:13.675 --> 00:37:15.815 And, uh, um, you know,
839 00:37:15.825 --> 00:37:17.495 we're gonna end up sitting here for five minutes. Uh,
840 00:37:17.835 --> 00:37:21.205 By the way, are you happy with the error that you have
841 00:37:21.225 --> 00:37:23.245 or is it Yes. Okay. Yeah,
842 00:37:23.585 --> 00:37:24.585 It is fine. Um,
843 00:37:24.585 --> 00:37:27.125 token Indic, see, length is no
844 00:37:27.265 --> 00:37:28.405 as long than the best fight.
845 00:37:28.405 --> 00:37:30.665 Yeah, that's, that's not a big deal.
846 00:37:30.925 --> 00:37:32.905 Um, it, it,
847 00:37:32.965 --> 00:37:35.385 it takes nothing away from the, the, the example.
848 00:37:36.045 --> 00:37:38.125 Um, yeah.
849 00:37:38.785 --> 00:37:40.125 And docking is just
850 00:37:40.125 --> 00:37:41.685 because I actually didn't know about it before.
851 00:37:41.925 --> 00:37:43.805 It's like fully open source and then fully
852 00:37:43.805 --> 00:37:44.805 Open sourced. So this
853 00:37:44.805 --> 00:37:47.685 was a project, uh, created by IBM Okay.
854 00:37:47.685 --> 00:37:48.965 Uh, research, uh,
855 00:37:49.225 --> 00:37:51.885 and they recently donated to the LFAI Foundation.
856 00:37:52.195 --> 00:37:55.125 Okay. Um, so it's fully open source, open governance, uh,
857 00:37:55.545 --> 00:37:57.685 you know, and I think, uh, it's really great tool.
858 00:37:57.685 --> 00:38:00.565 We added it to fe the feature servers specifically have
859 00:38:00.565 --> 00:38:04.605 like, um, you know, an open source parsing tool so
860 00:38:04.605 --> 00:38:07.445 that data scientists and ML engineers can,
861 00:38:07.945 --> 00:38:10.525 can be unblocked without having to really deal with, um,
862 00:38:12.185 --> 00:38:14.825 figuring out like, well, how can I take my PDFs, um,
863 00:38:15.525 --> 00:38:17.065 and extract them, right?
864 00:38:17.245 --> 00:38:19.105 Um, we know how to do that with regular text,
865 00:38:19.165 --> 00:38:20.785 and oftentimes that's how data comes.
866 00:38:20.845 --> 00:38:22.225 But that's, that's not the only thing.
867 00:38:23.045 --> 00:38:25.505 And does it support like dif so it supports pdf f
868 00:38:25.505 --> 00:38:27.705 but also does it support like different formats and stuff?
869 00:38:27.845 --> 00:38:30.545 It does. It, it supports a lot of really rich formats.
870 00:38:30.545 --> 00:38:32.225 And again, it's, uh, yeah. So there you go.
871 00:38:32.285 --> 00:38:34.665 Um, so it worked. Um, let's
872 00:38:34.665 --> 00:38:36.445 Go and take you over. I let you take over.
873 00:38:37.115 --> 00:38:38.585 Yeah, thank goodness.
874 00:38:38.645 --> 00:38:40.705 Oh, there's, there is an issue with the retriever,
875 00:38:40.705 --> 00:38:42.025 but that this part, but that's fine.
876 00:38:42.025 --> 00:38:43.105 That's a, that's just a bug.
877 00:38:43.245 --> 00:38:46.865 So here what you'll see is that, um, you know,
878 00:38:47.165 --> 00:38:51.335 the Dock Link features, you know, we passed in the first,
879 00:38:51.595 --> 00:38:54.695 uh, and actually let me, uh, go and look at the code, um,
880 00:38:56.705 --> 00:38:57.805 and I'll show you, right?
881 00:38:58.025 --> 00:39:02.705 Um, the query embedding
882 00:39:02.855 --> 00:39:04.105 that we showed
883 00:39:04.445 --> 00:39:09.445 was, here it
884 00:39:09.605 --> 00:39:10.785 What's the name of this paper?
885 00:39:12.395 --> 00:39:15.705 That's the question we asked, uh,
886 00:39:16.825 --> 00:39:17.885 and references here.
887 00:39:18.025 --> 00:39:20.245 And so they both showed, um, the same thing.
888 00:39:20.585 --> 00:39:22.495 Um, and
889 00:39:22.495 --> 00:39:24.935 because again, this one was like this batch transformed
890 00:39:24.995 --> 00:39:27.295 and just uploaded, and this one was the one
891 00:39:27.295 --> 00:39:28.935 that was transformed on the fly.
892 00:39:29.275 --> 00:39:30.815 And the reason it took so long is
893 00:39:30.815 --> 00:39:33.215 because it was iterating, I think it's, it's,
894 00:39:33.255 --> 00:39:35.175 I think it's 10 PDFs that I was processing.
895 00:39:35.215 --> 00:39:37.055 I should have just done one for this example, so my bad.
896 00:39:37.435 --> 00:39:41.735 Um, but, uh, you know, it went and, uh, chunked them
897 00:39:41.915 --> 00:39:43.135 and, and embedded them.
898 00:39:43.915 --> 00:39:45.135 And this is the example.
899 00:39:45.395 --> 00:39:46.895 Um, you know, we asked the question
900 00:39:46.915 --> 00:39:48.535 and it gives us a whole bunch of references
901 00:39:48.915 --> 00:39:50.335 of like, articles that are named.
902 00:39:50.355 --> 00:39:51.735 And so it works, it does the thing.
903 00:39:52.115 --> 00:39:54.495 Um, now the thing that we're really excited about, uh,
904 00:39:54.645 --> 00:39:58.375 with this, and, and, you know, things that we intend
905 00:39:58.375 --> 00:40:02.015 to enhance is making this, again, just really exceptional,
906 00:40:02.015 --> 00:40:03.975 easy for ML engineers, people
907 00:40:03.995 --> 00:40:06.855 to ship production rag applications that can really scale.
908 00:40:07.355 --> 00:40:09.335 Um, and so I'm, I'm gonna get into that in,
909 00:40:09.355 --> 00:40:10.655 in a little bit more in a second
910 00:40:10.655 --> 00:40:12.175 after I share my screen again.
911 00:40:12.675 --> 00:40:17.305 Um, yes, uh, share here.
912 00:40:21.385 --> 00:40:24.485 Yes. Um, we talked about ingestion
913 00:40:25.105 --> 00:40:26.805 and so the, the roadmap of feast.
914 00:40:26.805 --> 00:40:29.485 What, what are we doing next? More NLP, you know, again,
915 00:40:29.505 --> 00:40:32.085 we want FE to be the go-to framework for AI users
916 00:40:32.225 --> 00:40:33.845 to customize their rag solutions.
917 00:40:34.025 --> 00:40:36.965 And that means investing more in viss, uh, viss, you know,
918 00:40:36.965 --> 00:40:38.965 again, is, is an extraordinary database.
919 00:40:39.265 --> 00:40:41.405 Um, it, it's, it's, uh, you know,
920 00:40:41.545 --> 00:40:43.045 it has a great inline behavior
921 00:40:43.625 --> 00:40:47.885 or local behavior with pie mils light, uh, or mils light.
922 00:40:47.945 --> 00:40:50.645 Uh, that just makes it really easy for end users
923 00:40:50.745 --> 00:40:52.005 to get up and up and running.
924 00:40:52.225 --> 00:40:54.405 Um, and I think that's really important.
925 00:40:54.605 --> 00:40:57.485 A lot of, um, what I found when working, uh,
926 00:40:57.665 --> 00:41:00.365 or leading some of these teams is that, um, if it's,
927 00:41:00.365 --> 00:41:03.165 if there's a lot of friction for data scientists
928 00:41:03.165 --> 00:41:04.485 or end users to get started,
929 00:41:04.715 --> 00:41:06.085 they just, they just don't want to use it.
930 00:41:06.265 --> 00:41:08.125 Um, and so, uh,
931 00:41:08.385 --> 00:41:10.845 in least we've invested a lot into making that experience very good.
932 00:41:10.865 --> 00:41:12.925 And, and Nobus, uh, you know, is one
933 00:41:12.925 --> 00:41:14.405 of our fan frameworks because of that.
934 00:41:14.865 --> 00:41:17.205 Um, because of Nobus light, you know, you don't have
935 00:41:17.205 --> 00:41:19.405 to really think too much about containers
936 00:41:19.585 --> 00:41:21.005 or how to deploy it.
937 00:41:21.005 --> 00:41:23.085 You can just kind of hit PIP install and go.
938 00:41:23.545 --> 00:41:25.365 Um, and so, so that's one
939 00:41:25.365 --> 00:41:26.765 of the things that we really like about it.
940 00:41:26.765 --> 00:41:28.565 And again, we'll continue to invest a lot more in
941 00:41:28.605 --> 00:41:30.405 NLP, um, image support.
942 00:41:30.425 --> 00:41:34.605 So, you know, images are, are, are interesting
943 00:41:34.605 --> 00:41:36.365 because they're actually pretty analogous
944 00:41:36.365 --> 00:41:40.245 or equivalent to, um, to, to nl, to,
945 00:41:40.245 --> 00:41:44.245 to language in the sense that, um, you often want metadata,
946 00:41:44.245 --> 00:41:46.325 and this is one of the things that I think I understated in
947 00:41:46.325 --> 00:41:50.135 this talk, is that, um, feast allows you
948 00:41:50.135 --> 00:41:53.295 to store additional information beyond just the sentences
949 00:41:53.435 --> 00:41:54.575 or the tokens, right?
950 00:41:55.075 --> 00:41:57.215 Um, and it turns out there's a lot of rich structure
951 00:41:57.215 --> 00:41:58.335 that can be optimized for that.
952 00:41:58.335 --> 00:42:01.335 And, and if you actually work in, in recommender systems,
953 00:42:01.355 --> 00:42:04.135 you'll know, and, and like basically you can reduce RAG
954 00:42:04.195 --> 00:42:06.975 to being a, a recommender or ranking retrieving system.
955 00:42:07.875 --> 00:42:10.055 The, and there's a lot of really rich metadata
956 00:42:10.055 --> 00:42:13.175 that can be used to power, um,
957 00:42:14.505 --> 00:42:17.795 this text in addition to just text itself.
958 00:42:18.375 --> 00:42:19.475 And by that I mean, like,
959 00:42:21.615 --> 00:42:24.235 you can do what's basically like a hybrid ranking, right?
960 00:42:24.255 --> 00:42:26.315 And you can rank different, um, pieces
961 00:42:26.415 --> 00:42:28.195 of the text differently.
962 00:42:28.695 --> 00:42:32.595 Um, and so if you have like the title, if you have like the,
963 00:42:32.815 --> 00:42:37.135 the, um, uh, age of the document,
964 00:42:37.465 --> 00:42:39.775 these are all things that you can use to wait differently.
965 00:42:40.155 --> 00:42:44.015 And you can even build a model on top called a re-ran, um,
966 00:42:44.115 --> 00:42:45.495 to fully optimize these things.
967 00:42:45.675 --> 00:42:47.775 Uh, and so that is an area that we do intend
968 00:42:47.775 --> 00:42:49.335 to allow customization for.
969 00:42:49.675 --> 00:42:51.655 Um, and again, so that it can be fine tuned
970 00:42:51.765 --> 00:42:55.015 because you, you need to be able to fine tune it so that in,
971 00:42:55.035 --> 00:42:57.535 in serving, you know, what parameters are
972 00:42:57.595 --> 00:42:58.975 and which pieces of content
973 00:42:59.275 --> 00:43:01.095 and how you wanna structure the data.
974 00:43:01.315 --> 00:43:03.295 And, and that becomes very hard, hardly codified.
975 00:43:03.325 --> 00:43:06.135 It's, it's not the same rag of,
976 00:43:06.235 --> 00:43:09.575 I'm just gonna throw it into, um, into the context and,
977 00:43:09.675 --> 00:43:10.735 and see what works.
978 00:43:10.835 --> 00:43:13.575 Um, it is actually much more optimized systems.
979 00:43:13.795 --> 00:43:17.455 Um, and so I, I think, I think both have to exist
980 00:43:17.455 --> 00:43:19.295 and they both end up having to be very powerful.
981 00:43:19.355 --> 00:43:20.615 And as LMS get better,
982 00:43:20.615 --> 00:43:21.855 certainly that will get a little bit better.
983 00:43:21.855 --> 00:43:26.655 But my, my, my long view is that, um, there's always going
984 00:43:26.655 --> 00:43:28.255 to be a need for fine tuning some of these systems,
985 00:43:28.255 --> 00:43:31.135 especially once you hit real scale where another one
986 00:43:31.135 --> 00:43:33.495 or 2% really makes a big financial impact.
987 00:43:34.115 --> 00:43:38.175 Um, and so, uh, scaling Batch, we, we already support, uh,
988 00:43:38.465 --> 00:43:39.615 spark as an offline store.
989 00:43:39.615 --> 00:43:41.455 I mentioned that. And for batch transformations,
990 00:43:41.675 --> 00:43:44.055 we intend on incorporating Ray, uh,
991 00:43:44.055 --> 00:43:46.175 at a point in the future when other maintainers are pretty
992 00:43:46.175 --> 00:43:47.255 excited about doing that work.
993 00:43:47.755 --> 00:43:49.335 Um, and then latency improvements.
994 00:43:49.495 --> 00:43:52.135 I spent some time optimizing a lot of the computation
995 00:43:52.795 --> 00:43:55.575 and refu latency within Feast, uh, in the past.
996 00:43:55.635 --> 00:43:57.775 And we continue to, to invest in that, um,
997 00:43:57.775 --> 00:44:00.135 because we want feast to really be blazing fast.
998 00:44:00.395 --> 00:44:01.695 Um, you know, I used
999 00:44:01.695 --> 00:44:03.175 to work at a company called Fast for a reason.
1000 00:44:03.675 --> 00:44:07.575 Um, we like fast things. And so thank you.
1001 00:44:08.155 --> 00:44:10.055 Um, there's a Feast Rag blog post
1002 00:44:10.055 --> 00:44:11.335 that talks a little bit more about
1003 00:44:11.365 --> 00:44:13.935 what the value proposition of Feasts supporting RAG
1004 00:44:13.935 --> 00:44:15.535 and why I spent so much time on it.
1005 00:44:15.995 --> 00:44:17.335 Um, you know, I have a background in
1006 00:44:17.375 --> 00:44:18.815 NLP, uh, coincidentally.
1007 00:44:18.815 --> 00:44:20.495 And so that, you know, when I became a Feast maintainer,
1008 00:44:20.495 --> 00:44:22.615 in fact, that was a thing I said a year
1009 00:44:22.615 --> 00:44:23.695 and a half ago that I would do,
1010 00:44:23.715 --> 00:44:25.495 and, you know, almost done it.
1011 00:44:25.555 --> 00:44:27.575 Uh, job's not finished, but we're getting close.
1012 00:44:28.115 --> 00:44:30.615 Um, there's some links to the feast documentation,
1013 00:44:30.615 --> 00:44:33.045 the feast website, GitHub repo with the demo
1014 00:44:33.385 --> 00:44:35.165 and the GitHub repo with the docking demo.
1015 00:44:35.265 --> 00:44:38.765 So there's one that's just focused on Basic Rag with viss,
1016 00:44:39.425 --> 00:44:41.165 and then there's another with, you know, uh,
1017 00:44:41.595 --> 00:44:43.645 this docking demo as well with Viss.
1018 00:44:43.665 --> 00:44:46.285 And so I, I wanna say a big thank you to the, to,
1019 00:44:46.385 --> 00:44:48.965 to Stephanie and the folks here at VIS for inviting me
1020 00:44:48.965 --> 00:44:51.325 to talk and, and share the gospel of feast.
1021 00:44:51.465 --> 00:44:53.925 Um, and this is a, a generated image
1022 00:44:53.925 --> 00:44:55.925 of a robot eating a bunch of rags,
1023 00:44:55.925 --> 00:44:57.365 feasting on rags, if you will.
1024 00:44:57.665 --> 00:44:58.665 Uh, I thought that was funny.
1025 00:45:00.415 --> 00:45:03.165 Thank you very much. And I think it's funny, uh,
1026 00:45:03.695 --> 00:45:04.845 thank you very much for
1027 00:45:04.845 --> 00:45:06.245 very detailed presentation, actually.
1028 00:45:06.545 --> 00:45:08.725 Uh, see, even people are laughing in the chat
1029 00:45:08.725 --> 00:45:10.845 and it wrote in the chat, which means they really laughed.
1030 00:45:11.505 --> 00:45:12.505 So
1031 00:45:13.085 --> 00:45:17.525 I think, Uh, do we have question from people?
1032 00:45:18.065 --> 00:45:19.245 So just quickly,
1033 00:45:19.245 --> 00:45:21.045 otherwise, you mentioned I have one on
1034 00:45:21.045 --> 00:45:22.125 my side, so I'll just start.
1035 00:45:22.745 --> 00:45:25.525 Uh, you mentioned a couple of times, uh, you know,
1036 00:45:25.525 --> 00:45:27.325 like low latency for feast.
1037 00:45:28.105 --> 00:45:30.445 Uh, what is it, what does it mean exactly?
1038 00:45:30.875 --> 00:45:33.085 Like what is, um, the thing you're targeting usually
1039 00:45:33.305 --> 00:45:34.325 you you would say, sorry.
1040 00:45:34.755 --> 00:45:36.125 Yeah, that's, that's a really great question.
1041 00:45:36.245 --> 00:45:37.845 I was imprecise and I should have been more precise,
1042 00:45:37.905 --> 00:45:39.725 but, um, you know, like, I think, so if,
1043 00:45:39.725 --> 00:45:42.045 if you're serving stuff online, um,
1044 00:45:44.895 --> 00:45:47.825 usually to at really high scale you're gonna say, well,
1045 00:45:47.825 --> 00:45:50.585 what's my percentile distribution of latency?
1046 00:45:50.585 --> 00:45:54.185 Right? Exactly. And so, um, typically people focus on P 99,
1047 00:45:54.385 --> 00:45:56.105 'cause P 100, you're gonna have a bad time,
1048 00:45:56.205 --> 00:45:58.985 but you know, like P 99 there, let's optimize for that.
1049 00:45:59.605 --> 00:46:01.025 And depending on the data store
1050 00:46:01.025 --> 00:46:04.425 and indexing strategy, it ends up with different trade offs,
1051 00:46:04.425 --> 00:46:05.665 you know, and, and I think
1052 00:46:05.665 --> 00:46:08.625 for vector similarity search in particular, it's very hard,
1053 00:46:09.085 --> 00:46:12.585 um, because that scales proportional to the number
1054 00:46:12.585 --> 00:46:14.625 of documents being embedded and retrieved.
1055 00:46:14.805 --> 00:46:15.985 Um, and so like
1056 00:46:16.055 --> 00:46:18.585 that there are some bottlenecks that you're gonna ultimately hit.
1057 00:46:18.585 --> 00:46:20.715 So if you want like a hundred docs, well
1058 00:46:20.715 --> 00:46:23.915 that's gonna be a lot harder than like, it, you're gonna,
1059 00:46:23.915 --> 00:46:24.955 you're gonna hit some bottlenecks.
1060 00:46:24.955 --> 00:46:27.515 But for basic entity retrieval, um,
1061 00:46:27.985 --> 00:46:29.195 that we know really well.
1062 00:46:29.375 --> 00:46:32.595 And so things like, you know, a Redis cache or, or,
1063 00:46:32.655 --> 00:46:34.555 or Redis ends up being quite performant
1064 00:46:34.555 --> 00:46:36.795 where you can get P 90 nines of five milliseconds.
1065 00:46:37.135 --> 00:46:38.275 My SQL database, you can get
1066 00:46:38.275 --> 00:46:40.595 around 10 milliseconds, uh, P 99.
1067 00:46:41.175 --> 00:46:42.555 Um, and,
1068 00:46:42.695 --> 00:46:45.315 and you know, there, there are ways to continue to ize that.
1069 00:46:45.335 --> 00:46:47.755 But, but really within the code itself,
1070 00:46:47.985 --> 00:46:50.515 because like some of the things you get bottlenecked
1071 00:46:50.515 --> 00:46:53.635 by just the database, basically you end up like the, the,
1072 00:46:53.635 --> 00:46:54.635 the functional limit is
1073 00:46:54.635 --> 00:46:56.835 how fast can the database retrieve from the database.
1074 00:46:56.835 --> 00:46:58.075 And then you do that.
1075 00:46:58.535 --> 00:47:00.675 And, and the strategy is one,
1076 00:47:00.755 --> 00:47:04.915 having really efficient optimized code that isn't doing too,
1077 00:47:04.935 --> 00:47:07.955 too much in the handling and serialization
1078 00:47:07.955 --> 00:47:10.555 and de serialization and computation of the data on the fly.
1079 00:47:11.215 --> 00:47:14.745 Um, and then two pre-com computing.
1080 00:47:14.845 --> 00:47:17.625 And, and so like a lot of what feast really aims
1081 00:47:17.625 --> 00:47:18.705 to do is pre-compute.
1082 00:47:18.965 --> 00:47:21.185 And what we, what our documentation, uh,
1083 00:47:21.195 --> 00:47:24.545 talks about is pre-com computing is, is the gold standard.
1084 00:47:24.605 --> 00:47:27.265 If you want really good customer experiences, pre-compute
1085 00:47:27.265 --> 00:47:29.685 as much as you can, which is why we invest in the batch.
1086 00:47:30.105 --> 00:47:32.765 So a concrete example, you have a million documents
1087 00:47:32.765 --> 00:47:34.365 that you want to be able to search through.
1088 00:47:35.365 --> 00:47:37.705 You have to batch embed those, you have to embed them
1089 00:47:37.765 --> 00:47:40.865 and upload them online to the degree feasible.
1090 00:47:41.065 --> 00:47:44.105 'cause if you try to do that on the fly every time,
1091 00:47:44.765 --> 00:47:47.105 you'll see what we just experienced right in the terminal
1092 00:47:47.115 --> 00:47:48.745 where it's like, it's gonna take time.
1093 00:47:48.895 --> 00:47:51.145 There's like, you have to, you are bound
1094 00:47:51.445 --> 00:47:53.465 by the calculations that you have to execute.
1095 00:47:53.465 --> 00:47:55.465 Of course, we could have done them, you know, uh,
1096 00:47:55.505 --> 00:47:57.945 concurrently and made things slightly more efficient.
1097 00:47:58.205 --> 00:48:01.065 But the fact is, you have to pay that calculation tax.
1098 00:48:01.445 --> 00:48:06.145 So pay it before your, your, um, your users come to your,
1099 00:48:06.145 --> 00:48:07.705 your application if you can.
1100 00:48:07.725 --> 00:48:09.025 That's not always feasible, right?
1101 00:48:09.025 --> 00:48:11.265 If a user's uploading their own document
1102 00:48:11.285 --> 00:48:12.825 to you, well then you have to wait.
1103 00:48:12.945 --> 00:48:14.225 'cause you have to process it, you have once.
1104 00:48:14.285 --> 00:48:17.785 But if you have old documents that you wanna upload and,
1105 00:48:17.965 --> 00:48:19.985 and make accessible to every user,
1106 00:48:20.135 --> 00:48:21.745 well then you absolutely should have done
1107 00:48:21.745 --> 00:48:23.265 that like way beforehand.
1108 00:48:23.725 --> 00:48:25.825 Uh, and then, and then uploaded it.
1109 00:48:25.965 --> 00:48:29.145 So, um, but again, like at the beast layer,
1110 00:48:29.245 --> 00:48:32.425 we are gonna continue to optimize our code base so that it's
1111 00:48:32.425 --> 00:48:34.105 as lightweight and efficient as possible.
1112 00:48:35.765 --> 00:48:38.075 Thank you. And then I'll follow up with one is like,
1113 00:48:38.075 --> 00:48:40.315 so you used, um, like
1114 00:48:40.315 --> 00:48:42.115 how would you customize like embedding models
1115 00:48:42.175 --> 00:48:43.315 and chunking there directly?
1116 00:48:43.315 --> 00:48:45.635 Do you do it on the like docking level
1117 00:48:45.775 --> 00:48:46.955 or where does it work if you,
1118 00:48:47.135 --> 00:48:49.675 You can, docking has a really, uh, wide breadth of, uh,
1119 00:48:49.815 --> 00:48:51.915 of, uh, ability to actually customize.
1120 00:48:51.915 --> 00:48:54.555 And so like, uh, you know, I don't know the full extent
1121 00:48:54.555 --> 00:48:56.235 of all those parameters in docking,
1122 00:48:56.235 --> 00:48:58.275 but they are documented and available, pun intended.
1123 00:48:58.615 --> 00:49:02.595 Um, and uh, what I'd say is that if you,
1124 00:49:03.775 --> 00:49:06.475 if you're not falling within the subset of like, well,
1125 00:49:06.475 --> 00:49:08.635 my data doesn't fit donly well, that's actually part
1126 00:49:08.635 --> 00:49:12.485 of the point of feast is that whatever text data you have,
1127 00:49:12.545 --> 00:49:15.325 if it's just text, this is what piece is built, is
1128 00:49:15.325 --> 00:49:17.725 that you can choose to do whatever sentence transformer you
1129 00:49:17.725 --> 00:49:20.365 want, whatever PyTorch code you want, um,
1130 00:49:21.165 --> 00:49:24.165 whatever tokenization strategy you want, all
1131 00:49:24.165 --> 00:49:26.605 that can be done because you have the, the toolkit to say,
1132 00:49:26.605 --> 00:49:27.925 well, look, you can serve whatever you want.
1133 00:49:27.945 --> 00:49:30.045 You can execute arbitrary functions.
1134 00:49:30.385 --> 00:49:32.845 Um, I don't recommend calling other APIs within
1135 00:49:32.845 --> 00:49:34.045 feature transformations.
1136 00:49:34.235 --> 00:49:36.005 Some people will do that and that's fine.
1137 00:49:36.425 --> 00:49:38.925 Um, but like that, that, that's when you start to,
1138 00:49:39.545 --> 00:49:41.405 to introduce latency
1139 00:49:41.405 --> 00:49:43.125 and you actually get more complicated systems.
1140 00:49:43.265 --> 00:49:47.125 But, uh, ignoring that detail, um, the, the,
1141 00:49:48.105 --> 00:49:50.805 the flexibility of thesis is that, you know, ML engineers
1142 00:49:51.145 --> 00:49:54.125 who tend to be very rich domain experts in, in how to
1143 00:49:54.875 --> 00:49:58.445 outline these things, um, can, can really kind
1144 00:49:58.445 --> 00:49:59.765 of steer the wheel here.
1145 00:50:00.395 --> 00:50:01.485 Okay. And yeah,
1146 00:50:01.765 --> 00:50:03.605 actually follow up just more like on a Phil
1147 00:50:03.605 --> 00:50:05.005 philosophical level, sorry.
1148 00:50:05.585 --> 00:50:08.445 Uh, so your main users are ML engineers.
1149 00:50:08.515 --> 00:50:10.965 What we, you know, it was very popular a couple
1150 00:50:10.965 --> 00:50:13.245 of years ago when I was myself an ML engineer.
1151 00:50:14.065 --> 00:50:16.085 Do you see, like, how does it work now
1152 00:50:16.085 --> 00:50:18.485 with a new AI engineers, you know, like,
1153 00:50:18.485 --> 00:50:21.765 because they use a lot of, you know, API calls
1154 00:50:21.825 --> 00:50:23.325 and LMS directly and stuff.
1155 00:50:23.325 --> 00:50:25.325 Like how do you see the future with feast
1156 00:50:25.325 --> 00:50:27.005 and AI engineers and human engineers?
1157 00:50:27.555 --> 00:50:28.725 Yeah, that's a really good question.
1158 00:50:28.965 --> 00:50:31.525 I think like we're open to, we're definitely, like, I,
1159 00:50:31.645 --> 00:50:33.765 I love to, to cater to AI engineers.
1160 00:50:33.885 --> 00:50:35.325 I really would. I think, um,
1161 00:50:35.705 --> 00:50:37.685 and like I obviously have to, uh,
1162 00:50:37.695 --> 00:50:40.965 share CHIP'S book AI engineering, uh, and in it, I, I,
1163 00:50:40.965 --> 00:50:43.285 and I talk about this a lot, she mentions that
1164 00:50:44.665 --> 00:50:46.845 AI engineering emerged from ML engineering.
1165 00:50:46.995 --> 00:50:49.245 Yeah. Real thing is you don't have
1166 00:50:49.245 --> 00:50:50.685 to train a foundation model anymore.
1167 00:50:50.745 --> 00:50:52.125 You can just treat it as an end point.
1168 00:50:52.265 --> 00:50:54.125 But all of the, and and she says this in her book,
1169 00:50:54.125 --> 00:50:56.445 and I, I've quoted it in some internal papers I've written,
1170 00:50:57.985 --> 00:50:59.245 all the other problems are still there.
1171 00:50:59.625 --> 00:51:02.605 And Feast isn't about inference. Feast is about the data.
1172 00:51:02.905 --> 00:51:04.605 So all of the feast problems that,
1173 00:51:04.645 --> 00:51:06.085 that we encounter, they exist.
1174 00:51:06.385 --> 00:51:09.565 And other frameworks maybe don't have the 10 years
1175 00:51:09.585 --> 00:51:10.645 of knowledge that we've
1176 00:51:10.645 --> 00:51:13.405 and scars, uh, that we've developed in,
1177 00:51:13.405 --> 00:51:14.805 in building these production systems.
1178 00:51:15.425 --> 00:51:18.005 Um, but it's, uh, a lot
1179 00:51:18.005 --> 00:51:19.365 of those things are out of the box in feast.
1180 00:51:19.365 --> 00:51:22.205 But I think the challenge with Feast is that it's
1181 00:51:22.205 --> 00:51:24.965 so anchored towards ML engineering language
1182 00:51:24.965 --> 00:51:29.165 and jargon that it doesn't necessarily immediately catch
1183 00:51:29.355 --> 00:51:31.605 with like, uh, AI engineers.
1184 00:51:31.625 --> 00:51:33.965 And so, uh, there's the marketing angle like, Hey,
1185 00:51:33.965 --> 00:51:35.085 look, can we bridge that gap?
1186 00:51:35.565 --> 00:51:36.965 I don't know if we ever can,
1187 00:51:37.025 --> 00:51:39.805 but I do hope that, that we're able to continue to build
1188 00:51:39.945 --> 00:51:42.565 and enable those, those folks succeed as well.
1189 00:51:43.145 --> 00:51:46.325 Um, yeah, I definitely, like am welcome to, I think part
1190 00:51:46.325 --> 00:51:48.805 of the, the reason I built out this docking, uh,
1191 00:51:49.425 --> 00:51:53.445 and the demos that we've done is to actually, you know, go
1192 00:51:53.445 --> 00:51:54.805 to AI engineers and say like, Hey, look,
1193 00:51:54.805 --> 00:51:56.565 if you wanna do really, really sophisticated stuff,
1194 00:51:57.145 --> 00:51:58.365 here's the path to it.
1195 00:51:58.785 --> 00:52:03.415 But my, my high conviction bet is actually the amount
1196 00:52:03.415 --> 00:52:05.775 of ML engineers is going to continue to increase
1197 00:52:06.125 --> 00:52:07.735 because of more AI engineers.
1198 00:52:07.895 --> 00:52:10.135 'cause you start to find that like, oh, actually I do wanna,
1199 00:52:10.535 --> 00:52:12.255 I do wanna turn all these cranks
1200 00:52:12.275 --> 00:52:14.815 and I wanna put on all these bells and whistles
1201 00:52:14.815 --> 00:52:17.815 because that 2% now starts to be really valuable
1202 00:52:17.835 --> 00:52:19.975 and AI engineering really does unlock that.
1203 00:52:20.035 --> 00:52:23.725 And so, um, that, that's kind of where my, my view on it is.
1204 00:52:23.725 --> 00:52:25.205 But, but it's, it, there's a lot more
1205 00:52:25.205 --> 00:52:26.885 of a upfront cost you have to pay
1206 00:52:26.885 --> 00:52:29.205 with these things versus just like, you know,
1207 00:52:30.315 --> 00:52:32.045 like calling an end point or something, right?
1208 00:52:32.045 --> 00:52:33.445 Or just like using Pandas
1209 00:52:33.445 --> 00:52:35.685 to dump it into, into something right.
1210 00:52:35.705 --> 00:52:37.525 Or face, right? Like all, all these things.
1211 00:52:37.865 --> 00:52:39.125 And, and so I acknowledge
1212 00:52:39.125 --> 00:52:41.045 that like it is more engineering challenges.
1213 00:52:41.045 --> 00:52:42.565 That said, we do try
1214 00:52:42.565 --> 00:52:44.085 to make it pretty easy for people to get started.
1215 00:52:45.035 --> 00:52:46.445 Cool. Well thank you very much.
1216 00:52:46.785 --> 00:52:48.565 And just to wait, but yeah, hopefully
1217 00:52:48.565 --> 00:52:51.285 otherwise, uh, you still get to have like a lot
1218 00:52:51.285 --> 00:52:54.285 of AI engineers, you know, a lot of people coming, uh,
1219 00:52:54.305 --> 00:52:55.445 to using feast.
1220 00:52:55.705 --> 00:52:58.165 Uh, I was using it myself actually a couple of years ago,
1221 00:52:58.165 --> 00:53:00.205 so it's nice to see that it's still here, uh,
1222 00:53:00.255 --> 00:53:01.405 fully alive and you know,
1223 00:53:02.355 --> 00:53:03.965 Yeah, we're working on it, man. It's, uh,
1224 00:53:03.965 --> 00:53:04.965 Yeah.
1225 00:53:05.825 --> 00:53:08.045 And yeah, again to everyone. So it was recorded.
1226 00:53:08.385 --> 00:53:10.845 Uh, we'll share it, uh, in a couple of days
1227 00:53:10.845 --> 00:53:12.085 after it's been edited and stuff.
1228 00:53:12.085 --> 00:53:13.725 So if you also want to send it to your friends,
1229 00:53:14.505 --> 00:53:15.925 uh, feel free to do so.
1230 00:53:15.925 --> 00:53:17.245 Arthur, it was a wonderful presentation.
1231 00:53:17.265 --> 00:53:18.725 So thank you very much F Francisco
1232 00:53:19.705 --> 00:53:22.645 and hopefully I will get to see you one day
1233 00:53:22.825 --> 00:53:25.330 and hopefully some people will get to use feast as well
1234 00:53:25.425 --> 00:53:26.525 and see how nice it is.
1235 00:53:27.735 --> 00:53:29.085 Thank you. Thank you very much.
1236 00:53:29.215 --> 00:53:31.925 Thank you everyone, and have a lovely morning, afternoon,
1237 00:53:31.945 --> 00:53:33.285 or evening, wherever you are in the world.
1238 00:53:33.665 --> 00:53:34.005 See you.