What Makes Deep Research? Exploring AI Agents & Innovation

You’re in!

Training

What Makes "Deep Research"? A Dive into AI Agents

Resources

Slides | DeepSearcher

View presentation slides

Transcript

WEBVTT

1 00:00:03.565 --> 00:00:05.555 Today I'm pleased to introduce today's session,

2 00:00:05.825 --> 00:00:08.955 what makes Deep research a dive into AI agents

3 00:00:09.175 --> 00:00:10.875 and our guest speaker Stefan Webb.

4 00:00:11.415 --> 00:00:15.275 Stefan is a developer advocate at VIIs, where he advocates

5 00:00:15.275 --> 00:00:17.355 for the open source vector database, no list.

6 00:00:17.805 --> 00:00:20.475 Prior to this, he spent three years in the industry

7 00:00:20.475 --> 00:00:23.275 as an applied ML researcher at Twitter

8 00:00:23.335 --> 00:00:25.715 and meta collaborating with product teams

9 00:00:25.775 --> 00:00:27.755 to tackle their most complex challenges.

10 00:00:28.305 --> 00:00:31.195 Stephan holds a PhD from the University of Oxford,

11 00:00:31.455 --> 00:00:33.035 and he has published papers

12 00:00:33.255 --> 00:00:36.835 and leading, um, leading machine learning conferences such

13 00:00:36.835 --> 00:00:39.195 as nres, ICLR, and ICML.

14 00:00:39.655 --> 00:00:41.635 He is passionate about generative ai

15 00:00:41.815 --> 00:00:44.435 and it's eager to leverage his deep technical expertise

16 00:00:44.495 --> 00:00:46.435 to contribute to the open source community.

17 00:00:46.775 --> 00:00:47.915 Uh, welcome, Stefan.

18 00:00:48.455 --> 00:00:50.755 Thanks so much, Sachi. Thanks for the kind introduction.

19 00:00:51.135 --> 00:00:53.955 And you're right, I, I'm very passionate about generative AI

20 00:00:54.455 --> 00:00:56.835 and also passionate about helping developers.

21 00:00:57.455 --> 00:01:02.155 So, uh, really love, uh, doing webinars like this, uh,

22 00:01:02.155 --> 00:01:03.395 meeting some of our users

23 00:01:03.655 --> 00:01:06.035 and, you know, people just interested in, uh,

24 00:01:06.035 --> 00:01:07.515 their databases and ve ai.

25 00:01:08.615 --> 00:01:10.435 So, um, uh, just like a tiny bit more

26 00:01:10.435 --> 00:01:11.635 about myself before I get started.

27 00:01:12.255 --> 00:01:17.075 So I am what's called the, uh, developer advocate for zille,

28 00:01:17.575 --> 00:01:19.795 the company behind the leading open source

29 00:01:20.335 --> 00:01:22.355 vector database, uh, viss.

30 00:01:23.055 --> 00:01:26.155 And so as a developer advocate, a service like a,

31 00:01:26.155 --> 00:01:30.475 like a bridge between developers, the, um, the, the users

32 00:01:30.815 --> 00:01:33.915 of viss and the, um, the, um, the developers.

33 00:01:33.935 --> 00:01:38.515 So providing technical support to users, um,

34 00:01:38.585 --> 00:01:43.515 helping connect users with, um, uh, engineers for, um,

35 00:01:43.615 --> 00:01:44.955 you know, deeper technical support.

36 00:01:45.705 --> 00:01:48.755 Also running a lot of, um, uh, events

37 00:01:49.385 --> 00:01:50.955 like, uh, these webinars.

38 00:01:51.095 --> 00:01:55.195 We do a monthly, uh, meetup in the Bay Area, um,

39 00:01:55.855 --> 00:01:58.435 and, um, you know, producing some, like,

40 00:01:58.665 --> 00:02:00.075 some written content as well.

41 00:02:01.495 --> 00:02:03.915 So, um, I've put my LinkedIn there.

42 00:02:04.135 --> 00:02:07.155 Uh, I always love connecting with, with, with, uh,

43 00:02:07.225 --> 00:02:08.515 with, um, new folks.

44 00:02:09.315 --> 00:02:10.875 I love hearing like what, what you're building

45 00:02:10.875 --> 00:02:13.795 with generative ai, hearing, like what your sort

46 00:02:13.795 --> 00:02:14.875 of like challenges are

47 00:02:15.015 --> 00:02:18.235 and what your, your, um, your, uh, your visions are.

48 00:02:18.895 --> 00:02:20.235 That's like my, my bread and butter.

49 00:02:20.455 --> 00:02:23.395 So, uh, please, uh, connect with me on LinkedIn.

50 00:02:23.475 --> 00:02:24.835 I would love to, to hear from you.

51 00:02:25.015 --> 00:02:27.355 And, um, you know, maybe like if,

52 00:02:27.735 --> 00:02:28.955 if you're building like a rag

53 00:02:29.055 --> 00:02:31.875 or an agent system with your, your startup

54 00:02:31.935 --> 00:02:35.035 or your company, I think there's a really good opportunity

55 00:02:35.215 --> 00:02:39.515 for, um, for, um, a developer advocate at a company like Zel

56 00:02:39.695 --> 00:02:40.755 to, to sort of help

57 00:02:40.815 --> 00:02:45.395 and, um, provide some, provide some, um, some consultation.

58 00:02:46.175 --> 00:02:50.195 So with that, let's get started with the, the webinar.

59 00:02:51.255 --> 00:02:55.555 So the, the topic for today is what makes deep research,

60 00:02:56.375 --> 00:02:59.075 and I've subtitled it, I dive into AI agents.

61 00:02:59.745 --> 00:03:02.565 So we're gonna be talking about research agents

62 00:03:03.205 --> 00:03:04.725 specifically, uh,

63 00:03:04.745 --> 00:03:06.845 but I think a lot of this sort of also relates

64 00:03:07.465 --> 00:03:09.925 to generative AI agents in general.

65 00:03:12.065 --> 00:03:15.165 So I will start off, I'll, um, just to like, give a tiny bit

66 00:03:15.165 --> 00:03:19.845 of background to, um, open AI's deep research release,

67 00:03:20.785 --> 00:03:24.485 and then I'm going to introduce a, a research agent

68 00:03:24.755 --> 00:03:29.365 that is open source inspired by that, um, produced by

69 00:03:30.085 --> 00:03:32.685 engineers at Zillows and fully open sourced.

70 00:03:33.745 --> 00:03:37.285 So, um, uh, I, I say demo, it's more of like a,

71 00:03:37.285 --> 00:03:38.405 like a code walkthrough.

72 00:03:38.875 --> 00:03:40.965 I'll sort of like explain how it was put together.

73 00:03:42.475 --> 00:03:45.845 Then after that I'll talk a bit about some of the ideas

74 00:03:46.505 --> 00:03:50.765 behind, um, agents in general, uh, but also research agents

75 00:03:51.305 --> 00:03:52.685 and what's kind of like new

76 00:03:52.865 --> 00:03:56.005 and why is deep research sort

77 00:03:56.005 --> 00:03:58.485 of come on the scene, uh, so recently.

78 00:03:59.465 --> 00:04:02.325 And then with, with, um, you know, with that discussion,

79 00:04:02.865 --> 00:04:05.125 it should be like clear, like what some of the,

80 00:04:05.125 --> 00:04:09.645 the challenges and, uh, yeah,

81 00:04:09.805 --> 00:04:11.765 I guess like, like challenges and obstacles to,

82 00:04:11.945 --> 00:04:13.165 to wider adoption.

83 00:04:13.345 --> 00:04:15.285 So we'll talk about some of those

84 00:04:15.985 --> 00:04:17.645 and some potential solutions.

85 00:04:17.705 --> 00:04:21.565 So sort of give you a, uh, gives you, gives you a sense of

86 00:04:22.015 --> 00:04:25.125 where things are headed over the short term,

87 00:04:25.195 --> 00:04:27.085 next six months, six, 12 months, et cetera.

88 00:04:27.945 --> 00:04:29.045 So with

89 00:04:29.045 --> 00:04:33.925 that, Let's get started.

90 00:04:34.105 --> 00:04:38.005 And, um, so by the way, uh, feel free to ask questions

91 00:04:38.705 --> 00:04:41.085 in the chat as, as they occur to you.

92 00:04:42.065 --> 00:04:44.165 And, um, I'll just kind of like, I'll, I'll stop

93 00:04:44.365 --> 00:04:46.325 whenever a question comes in, um, or,

94 00:04:46.325 --> 00:04:48.605 or try my best to do so and, and take them as they come in.

95 00:04:49.625 --> 00:04:51.725 So, um, okay.

96 00:04:52.385 --> 00:04:56.125 So, um, uh, I'm sure like everyone here has heard about

97 00:04:56.835 --> 00:05:00.925 open ai, uh, one of their, their new product releases, uh,

98 00:05:00.955 --> 00:05:05.325 deep research, which was released at the very, um,

99 00:05:05.665 --> 00:05:09.365 or, uh, near the start of February last month.

100 00:05:10.465 --> 00:05:13.085 And so this is, um, it's a bit of a different product

101 00:05:13.465 --> 00:05:18.245 to their sort of like straight, um, chatbot in that,

102 00:05:18.865 --> 00:05:23.365 um, uh, so it is able to, to go off, uh,

103 00:05:23.425 --> 00:05:26.925 search the web, do other, use other sort of like tools

104 00:05:27.465 --> 00:05:31.965 to build, um, a really detailed report to your question.

105 00:05:32.905 --> 00:05:33.965 And so you can give it.

106 00:05:34.225 --> 00:05:36.125 So, um, I've just taken a screenshot here.

107 00:05:37.225 --> 00:05:39.205 Um, and so, um, this is an example.

108 00:05:39.865 --> 00:05:44.645 The question in, in this case might have been, um, uh,

109 00:05:44.705 --> 00:05:48.365 please research, um, freestyle snowboards suitable

110 00:05:49.105 --> 00:05:50.685 for an intermediate rider with,

111 00:05:51.225 --> 00:05:55.965 and then the user's given, uh, some details, their height,

112 00:05:55.965 --> 00:05:57.645 their weight, shoe size, et cetera.

113 00:05:58.705 --> 00:06:03.245 So, um, then this, uh, this agent, um, goes off,

114 00:06:03.985 --> 00:06:07.565 uh, uses, so searches the web, um,

115 00:06:08.625 --> 00:06:13.165 and, uh, is able to sort of, uh, work, um, yeah, sort

116 00:06:13.165 --> 00:06:16.125 of like work out how to answer this question.

117 00:06:16.905 --> 00:06:21.725 Um, sort of like iterating from, uh, one step to another.

118 00:06:22.665 --> 00:06:25.485 And then after, um, some time could be like eight minutes,

119 00:06:25.485 --> 00:06:29.565 could be 30 minutes, uh, synthesize like a really detailed

120 00:06:30.145 --> 00:06:32.645 and coherent informed report.

121 00:06:33.745 --> 00:06:36.525 And so, so this is like much different from the sort

122 00:06:36.525 --> 00:06:40.725 of plain old, uh, chat GPT that is just like,

123 00:06:41.945 --> 00:06:44.765 um, you know, like, like returning you an answer more

124 00:06:44.765 --> 00:06:46.965 or less in, in real time, rather than sort of like going off

125 00:06:47.425 --> 00:06:50.325 and, uh, going through like a lot of, uh, autonomous steps.

126 00:06:53.065 --> 00:06:57.045 So, um, I think, excuse me, uh,

127 00:06:57.865 --> 00:07:01.045 why this sort of, I think sort of like exploded in the, um,

128 00:07:01.045 --> 00:07:02.885 the media was because, uh,

129 00:07:02.885 --> 00:07:05.565 people were really impressed by the results.

130 00:07:06.505 --> 00:07:11.365 It seemed to do a very good job of actually researching,

131 00:07:11.785 --> 00:07:15.205 uh, a topic that might require not just like a plain answer,

132 00:07:15.265 --> 00:07:17.085 but might actually require going off,

133 00:07:17.355 --> 00:07:18.805 looking at multiple sources,

134 00:07:21.885 --> 00:07:23.105 Asking further questions.

135 00:07:24.045 --> 00:07:26.945 And I think I read somewhere, um, you know, one sort of, uh,

136 00:07:26.975 --> 00:07:29.905 professor was like, this sort of, you know,

137 00:07:29.905 --> 00:07:32.265 could replace like a, um,

138 00:07:32.535 --> 00:07:35.865 like a early stage PhD student in terms of, uh, doing some,

139 00:07:35.895 --> 00:07:38.145 some research and, um,

140 00:07:38.145 --> 00:07:39.225 other professionals were just

141 00:07:39.225 --> 00:07:40.265 like really impressed with the result.

142 00:07:40.725 --> 00:07:42.825 Um, but what exactly was new about it?

143 00:07:42.895 --> 00:07:45.865 Well, it wasn't the first research agent

144 00:07:46.385 --> 00:07:50.105 released commercially, so Google's, uh, deep research

145 00:07:50.685 --> 00:07:53.985 was released about a month earlier in, in December.

146 00:07:56.045 --> 00:07:58.945 Um, so, um, what exactly was, was new about it?

147 00:07:58.945 --> 00:08:01.945 Like what, why, why did it sort of, um, what was it about it

148 00:08:01.945 --> 00:08:05.745 that had this really much superior, um, output,

149 00:08:06.005 --> 00:08:07.025 uh, qualitatively?

150 00:08:08.485 --> 00:08:11.825 And, um, I think the answer to that is, uh,

151 00:08:12.235 --> 00:08:14.745 we're not really sure because it's, it's closed source.

152 00:08:15.375 --> 00:08:17.625 It's sort of like tightly guarded secret, the design.

153 00:08:18.365 --> 00:08:21.945 But, um, from like, you know, the, uh, like the sort

154 00:08:21.945 --> 00:08:23.065 of rumor mill, it's,

155 00:08:23.385 --> 00:08:25.905 I suppose it's like people speaking to insiders.

156 00:08:26.445 --> 00:08:29.065 Uh, plus also like the, um, the, the, the blog

157 00:08:29.065 --> 00:08:32.625 and announcement that OpenAI released, it seems like a big,

158 00:08:32.825 --> 00:08:36.305 a big sort of, um, element of that was, um,

159 00:08:36.455 --> 00:08:40.425 that it focuses on, uh, like a, uh,

160 00:08:41.405 --> 00:08:43.625 uh, like an end to end, uh, training

161 00:08:43.625 --> 00:08:47.545 with reinforcement learning on really high quality, uh, uh,

162 00:08:47.575 --> 00:08:51.265 reasoning trace data, which we'll discuss more in a minute.

163 00:08:51.765 --> 00:08:53.025 Um, but again,

164 00:08:53.305 --> 00:08:55.025 possibly there's other things in there in the design.

165 00:08:55.625 --> 00:08:57.285 Uh, we just dunno 'cause it's close source,

166 00:08:58.065 --> 00:09:00.765 but we can kind of like guess what they, they are by trying

167 00:09:00.785 --> 00:09:03.285 to like, um, reproduce a system

168 00:09:03.395 --> 00:09:07.725 that can achieve similar results on, on the, um, uh, the,

169 00:09:07.945 --> 00:09:09.405 um, uh, benchmarks that we're using.

170 00:09:10.265 --> 00:09:14.685 And so, so one such, uh, one such model is, uh,

171 00:09:14.995 --> 00:09:17.125 from, uh, from deep seek.

172 00:09:17.505 --> 00:09:19.645 So deep seek r run, uh, R one.

173 00:09:19.645 --> 00:09:21.645 We'll talk about that a bit later on.

174 00:09:25.405 --> 00:09:28.135 Okay. So, uh, what exactly is a research agent

175 00:09:28.555 --> 00:09:33.375 and how does a research agent differ from just like a,

176 00:09:33.595 --> 00:09:36.655 you know, how you, uh, an agent in, in the general sense?

177 00:09:37.595 --> 00:09:40.255 And I think it's like one of those things in generative ai,

178 00:09:40.835 --> 00:09:44.775 it hasn't, people disagree on the definitions so far.

179 00:09:45.395 --> 00:09:47.015 Um, we're still sort of like coalescing

180 00:09:47.015 --> 00:09:49.055 around an exact, uh, definition.

181 00:09:50.515 --> 00:09:54.735 But, um, uh, uh, my definition, which I think overlaps with

182 00:09:54.735 --> 00:09:59.135 with many people is it's an agent that, so the, um, the,

183 00:09:59.135 --> 00:10:02.375 the goal is to, to, to, um,

184 00:10:02.595 --> 00:10:06.015 to do research in the sense that it has to go off

185 00:10:06.555 --> 00:10:10.255 and discover, uh, many, many relevant sources.

186 00:10:10.635 --> 00:10:14.095 So it is not just like doing a single lookup to, um,

187 00:10:14.295 --> 00:10:16.455 a vector database or, you know,

188 00:10:16.455 --> 00:10:19.495 it's not just accessing like a single Wikipedia page,

189 00:10:20.205 --> 00:10:23.775 it's pulling in, uh, it's, it's, um, uh,

190 00:10:23.775 --> 00:10:28.695 making a decision about, uh, various sources to, to search

191 00:10:30.275 --> 00:10:32.215 and then, um, uh,

192 00:10:32.545 --> 00:10:35.015 break the question down into to multiple steps.

193 00:10:35.995 --> 00:10:40.295 Um, and, uh, sort of, uh, have, uh,

194 00:10:40.395 --> 00:10:42.415 or, uh, autonomously sort of like a reason

195 00:10:42.415 --> 00:10:45.775 through answering the question and then synthesize like a,

196 00:10:46.135 --> 00:10:49.015 a detailed report, um, at the end.

197 00:10:49.955 --> 00:10:52.655 And so, um, I got some, like, some quotes here from the,

198 00:10:53.115 --> 00:10:55.575 the, the deep research release blog.

199 00:10:56.275 --> 00:10:58.615 And I sort of like saw like three themes.

200 00:10:59.355 --> 00:11:03.735 So we've got iteration, uh, we've got, um, search

201 00:11:04.115 --> 00:11:07.295 or, um, uh, I guess we talk like, like tool usage.

202 00:11:07.835 --> 00:11:10.655 And then the third is, uh, reasoning.

203 00:11:11.835 --> 00:11:16.055 So under the, the topic of iteration, so the, the, um,

204 00:11:16.325 --> 00:11:20.335 deep research release it, uh, blog posts, it mentioned, it,

205 00:11:20.355 --> 00:11:24.775 it had, um, uh, describe things like learn to plan,

206 00:11:25.005 --> 00:11:29.015 execute a multi-step trajectory, also backtracking

207 00:11:29.115 --> 00:11:31.815 and reacting to real time information.

208 00:11:33.115 --> 00:11:36.655 So this is obviously like describing like a, um, uh,

209 00:11:36.655 --> 00:11:40.095 like an agent that is able to sort of know what

210 00:11:40.095 --> 00:11:43.535 to do next autonomously, um, also, uh, pivoting

211 00:11:43.555 --> 00:11:46.015 as needed in reaction to information it encounters.

212 00:11:47.635 --> 00:11:48.935 Um, so a second thing,

213 00:11:48.935 --> 00:11:51.175 and I think these are sort of like, there's a, you know,

214 00:11:51.245 --> 00:11:52.415 overlap between these three.

215 00:11:52.875 --> 00:11:56.565 Uh, but under the, the topic of search, the,

216 00:11:56.565 --> 00:12:00.285 the blog post contained things like train end-to-end

217 00:12:00.885 --> 00:12:02.805 reinforcement learning on hard browsing

218 00:12:02.805 --> 00:12:05.485 and reasoning tasks across a range of domains.

219 00:12:06.025 --> 00:12:08.045 And I think it's generally sort of like, suppose

220 00:12:08.045 --> 00:12:10.005 that this is kind of like the main, uh,

221 00:12:10.005 --> 00:12:14.125 secret source ingredient, um, also optimized

222 00:12:14.125 --> 00:12:16.285 for web browsing and data analysis.

223 00:12:17.985 --> 00:12:22.565 And then, um, uh, on, on the, the third theme, which is, uh,

224 00:12:22.695 --> 00:12:25.605 which overlaps with iteration search is, uh, reasoning.

225 00:12:26.305 --> 00:12:28.685 So fine tuned on the upcoming

226 00:12:29.275 --> 00:12:31.005 open AI oh three reasoning model.

227 00:12:31.505 --> 00:12:33.845 Um, and it leverages reasoning to search, interpret,

228 00:12:33.905 --> 00:12:37.645 and analyze massive amounts of, uh, text.

229 00:12:39.025 --> 00:12:41.445 So we can sort of like, uh, uh, sort

230 00:12:41.445 --> 00:12:44.565 of like piece from this, this, um, uh, uh,

231 00:12:44.915 --> 00:12:47.365 blog post release, how it might work,

232 00:12:47.945 --> 00:12:49.805 and relate that to like the,

233 00:12:49.805 --> 00:12:51.765 the latest developments happening in generative ai

234 00:12:52.305 --> 00:12:53.845 and, um, try

235 00:12:53.845 --> 00:12:58.045 and sort of like, uh, uh, reproduce the results

236 00:12:58.505 --> 00:12:59.605 by building our own system.

237 00:13:05.195 --> 00:13:06.575 And, um, that's exactly what we did.

238 00:13:06.575 --> 00:13:10.295 So, uh, we were very excited by the release, um, given the,

239 00:13:10.955 --> 00:13:15.335 uh, you know, we sort of like saw the, the qualitative, um,

240 00:13:16.115 --> 00:13:17.975 uh, quality of, of the output.

241 00:13:18.755 --> 00:13:23.135 And so we were really curious, uh, being a, you know,

242 00:13:23.175 --> 00:13:26.215 a vector database company, vector databases being one

243 00:13:26.375 --> 00:13:29.375 of the core components, powering, um, agents.

244 00:13:30.035 --> 00:13:31.335 Uh, we were really curious, like,

245 00:13:31.335 --> 00:13:35.975 could we build our own open source version to, to, um,

246 00:13:35.975 --> 00:13:37.015 to, to work similarly.

247 00:13:37.675 --> 00:13:39.375 And, um, that's what we did about a month ago.

248 00:13:39.755 --> 00:13:42.575 Um, some engineers built, uh, an open source

249 00:13:43.335 --> 00:13:44.935 software called Deep Searcher.

250 00:13:47.965 --> 00:13:50.705 And so, so like deep research, you give it a query,

251 00:13:51.325 --> 00:13:55.265 it then goes off, um, searches through multiple sources, um,

252 00:13:55.295 --> 00:13:58.745 sort of like iterates, uh, like, uh, breaks down the,

253 00:13:58.765 --> 00:14:01.705 the question into, um, steps that can iterate over,

254 00:14:02.175 --> 00:14:06.185 finally makes a decision about when to, to, to stop,

255 00:14:06.525 --> 00:14:07.825 um, answering the question.

256 00:14:08.765 --> 00:14:11.305 And, um, then synthesizes like a detailed

257 00:14:11.525 --> 00:14:12.865 report from all that information.

258 00:14:16.535 --> 00:14:19.595 And, um, so this, um, uh, this research agent,

259 00:14:20.105 --> 00:14:24.435 it's built on top of the Vector database, um, vis, uh,

260 00:14:24.485 --> 00:14:27.515 which is, um, so, so zits, we are the main contributors.

261 00:14:28.095 --> 00:14:31.195 Uh, it's been donated to the, the Linux Foundation for AI

262 00:14:31.295 --> 00:14:34.675 and, and data, um, uh, since, um, uh,

263 00:14:35.005 --> 00:14:36.395 since I think, uh, 2020.

264 00:14:37.135 --> 00:14:39.635 And so, uh, let just say a few words about VIS

265 00:14:39.635 --> 00:14:43.595 before we go into the code of Deep searcher.

266 00:14:44.735 --> 00:14:49.275 So, um, so VIS is fully open source, um, Apache,

267 00:14:49.535 --> 00:14:52.125 um, library, so suitable for commercial use

268 00:14:53.105 --> 00:14:54.925 and, um, very simple to use.

269 00:14:55.225 --> 00:14:58.605 You can pip install the, the light version on,

270 00:14:58.625 --> 00:15:03.205 on your notebook, um, a much sort of more, um, uh,

271 00:15:03.445 --> 00:15:05.165 scalable version you can launch in a

272 00:15:05.165 --> 00:15:06.405 docket image really easily.

273 00:15:07.625 --> 00:15:12.325 And, um, then we have like a third version,

274 00:15:12.415 --> 00:15:15.325 which is the, the fully distributed version mils cluster

275 00:15:16.475 --> 00:15:19.605 that, um, uh, you know, like you, you launch on a cluster

276 00:15:19.605 --> 00:15:20.885 of machines via Kubernetes

277 00:15:21.385 --> 00:15:25.125 and can scale to like literally the, um, uh, tens

278 00:15:25.125 --> 00:15:27.085 to hundreds civilians of, of vectors.

279 00:15:31.985 --> 00:15:36.925 So, um, easier set up, um, has really good integration

280 00:15:37.715 --> 00:15:41.205 into the, the generative AI tooling ecosystem.

281 00:15:42.905 --> 00:15:46.725 So because it's open source, uh, we have a lot of, uh,

282 00:15:47.045 --> 00:15:50.205 contributions from, you know, pretty much all of like the,

283 00:15:50.205 --> 00:15:53.085 the, the big tools in generative ai.

284 00:15:53.905 --> 00:15:57.685 So whether that's like hugging face open ai, l chain, Gina,

285 00:15:58.185 --> 00:16:02.765 um, air byte, um, there's, um, I would say like, uh, like,

286 00:16:03.225 --> 00:16:05.325 you know, dozens and dozens of these integrations,

287 00:16:05.425 --> 00:16:09.485 so you'll be able to use it within your existing,

288 00:16:09.705 --> 00:16:10.965 uh, tool set most likely.

289 00:16:14.025 --> 00:16:17.085 And I think like a strong sort of like, um, I guess sort

290 00:16:17.085 --> 00:16:21.205 of like, uh, recommendation for, for its, uh, performance

291 00:16:21.425 --> 00:16:24.605 and reliability is the fact that it's used by a lot of,

292 00:16:24.625 --> 00:16:25.765 of really big companies.

293 00:16:26.345 --> 00:16:30.645 So everyone from nvidia, Microsoft, um, Salesforce,

294 00:16:31.785 --> 00:16:33.125 uh, Ikea and so on.

295 00:16:37.145 --> 00:16:41.485 And so, um, uh, just like very briefly mentioned, so, um,

296 00:16:41.605 --> 00:16:45.805 I think like a big sort of use for, uh, vector databases is,

297 00:16:45.985 --> 00:16:49.845 uh, retrieval augmented generation, as well as, uh,

298 00:16:49.845 --> 00:16:51.805 what we are now calling agents, which is sort

299 00:16:51.805 --> 00:16:53.365 of like extensions to this framework.

300 00:16:54.065 --> 00:16:55.925 And, um, so, so just to like, make it clear like

301 00:16:55.925 --> 00:16:57.045 where the vector database fits in,

302 00:16:57.935 --> 00:17:01.025 I've got like a schematic here of a rag pipeline.

303 00:17:02.045 --> 00:17:04.825 And so you start off with a knowledge base of things

304 00:17:04.825 --> 00:17:05.865 that you wanna search over.

305 00:17:06.245 --> 00:17:09.545 Oh, and so, so by the way, so, um, our sort of like, uh,

306 00:17:09.785 --> 00:17:13.905 research agent pipeline will be like an extension of, uh,

307 00:17:14.065 --> 00:17:15.385 a basic rag pipeline.

308 00:17:16.605 --> 00:17:18.985 Uh, so we've got a knowledge base that we wanna search over.

309 00:17:19.085 --> 00:17:21.425 So this might be like your internal company documents.

310 00:17:22.205 --> 00:17:25.865 It might be like, um, uh, images from customers

311 00:17:25.965 --> 00:17:27.585 or like videos that people uploaded.

312 00:17:29.005 --> 00:17:32.185 You then put that through your embedding deep your network

313 00:17:32.765 --> 00:17:34.465 to produce these vector embeddings,

314 00:17:34.925 --> 00:17:37.825 and then you store that in, uh, vis.

315 00:17:38.945 --> 00:17:43.005 And so, so Mils, um, then provides a really convenient, um,

316 00:17:43.145 --> 00:17:47.085 and efficient interface for performing a similarity search

317 00:17:47.625 --> 00:17:50.605 or, um, uh, essentially a semantic search.

318 00:17:51.265 --> 00:17:52.685 So in a rag chatbot,

319 00:17:52.705 --> 00:17:54.605 the user then comes along with their question.

320 00:17:56.145 --> 00:17:58.685 Um, so this question gets put

321 00:17:58.685 --> 00:18:00.405 through typically the same embedding model,

322 00:18:01.345 --> 00:18:04.885 and then we search for similar vectors to the query vector

323 00:18:05.625 --> 00:18:06.965 in our vector database

324 00:18:07.705 --> 00:18:11.685 and retrieve, uh, vectors that are close that correspond

325 00:18:11.685 --> 00:18:13.285 to items in our knowledge base.

326 00:18:14.265 --> 00:18:16.205 And because of the way that these models work,

327 00:18:16.775 --> 00:18:19.845 those ones will be semantically similar to our query.

328 00:18:20.545 --> 00:18:24.045 And so in other words, they'll contain relevant information

329 00:18:24.665 --> 00:18:26.805 to the, um, the query being answered.

330 00:18:27.825 --> 00:18:29.605 So then the idea of rag very simple.

331 00:18:30.265 --> 00:18:34.525 We just put those into the context of the prompt that we put

332 00:18:34.555 --> 00:18:37.405 that we, uh, run the large language model on,

333 00:18:37.425 --> 00:18:38.885 or large language vision model

334 00:18:38.945 --> 00:18:40.725 or whatever foundation model, um, you're using.

335 00:18:41.625 --> 00:18:45.565 So we augment the user's question with

336 00:18:46.095 --> 00:18:49.325 these retrieved, uh, documents from the vector database,

337 00:18:50.305 --> 00:18:51.725 put them into a large language model.

338 00:18:52.425 --> 00:18:55.485 And then because the large language model has that context,

339 00:18:56.355 --> 00:18:59.885 it's able to give a much more, um, uh, reliable,

340 00:19:00.345 --> 00:19:01.445 um, up-to-date answer.

341 00:19:02.105 --> 00:19:03.925 So you can, you think about it as like,

342 00:19:04.275 --> 00:19:08.225 like an external memory for the, um, for, for your,

343 00:19:08.225 --> 00:19:09.745 your rag or your agent.

344 00:19:10.605 --> 00:19:13.545 So an external memory that you can like, update

345 00:19:13.605 --> 00:19:15.905 as new facts, uh, new data come in,

346 00:19:16.725 --> 00:19:19.585 and you don't have to retrain your, um,

347 00:19:20.015 --> 00:19:22.105 your foundation model, your large language model.

348 00:19:24.935 --> 00:19:26.875 So I've got two, uh, links here.

349 00:19:27.415 --> 00:19:30.595 Uh, I think these are some really good resources if you're

350 00:19:30.595 --> 00:19:32.275 getting started with, uh, with VIS

351 00:19:32.335 --> 00:19:36.795 or just building, um, generative AI applications

352 00:19:36.795 --> 00:19:38.115 of vector databases in general.

353 00:19:39.055 --> 00:19:42.315 So on the ride, I've got the GitHub to, to melvic,

354 00:19:42.335 --> 00:19:44.995 so you know, you've got instructions to download it, a link

355 00:19:44.995 --> 00:19:49.115 to the docs, a lot of really useful, um, tutorials.

356 00:19:50.055 --> 00:19:51.795 Um, and then on the right hand side,

357 00:19:51.865 --> 00:19:55.075 I've got our generative AI learning, uh, portal,

358 00:19:55.405 --> 00:19:59.275 which has a lot of really useful, uh, notebooks

359 00:19:59.945 --> 00:20:02.395 telling you like a sort of, um, uh, taking you

360 00:20:02.395 --> 00:20:03.795 through the steps of building much more

361 00:20:04.265 --> 00:20:06.035 substantive, um, applications.

362 00:20:06.735 --> 00:20:10.595 So really good like resource to, to learn how to build, um,

363 00:20:10.895 --> 00:20:14.195 you know, rag agents, recommended systems,

364 00:20:14.395 --> 00:20:16.275 semantic search and so on.

365 00:20:18.305 --> 00:20:21.205 Uh, but let's now turn to a code walkthrough

366 00:20:21.305 --> 00:20:25.445 of deep searcher to see how we actually, um,

367 00:20:25.745 --> 00:20:29.885 how we actually constructed this, uh, this research agent.

368 00:20:30.785 --> 00:20:32.525 And I think it helps before we actually go into the code

369 00:20:32.915 --> 00:20:36.845 just to have like a mental, uh, model of, of what it's,

370 00:20:36.845 --> 00:20:39.565 what it's actually doing, uh, so that we can sort of like,

371 00:20:39.585 --> 00:20:42.285 um, you know, scaffold out what, uh, I guess like, sort

372 00:20:42.285 --> 00:20:43.685 of keep that in mind as we're going

373 00:20:43.685 --> 00:20:45.085 through the code so that it makes sense.

374 00:20:46.185 --> 00:20:49.885 Um, so, um, similarly to a rag system,

375 00:20:50.475 --> 00:20:53.445 this research agent has two separate parts.

376 00:20:53.505 --> 00:20:57.445 The first is, uh, data ingestion, which happens, um,

377 00:20:58.465 --> 00:21:00.845 uh, in our case, uh, beforehand.

378 00:21:01.625 --> 00:21:05.165 So you tell it what internal documents crawled web pages,

379 00:21:05.485 --> 00:21:09.245 structured data, uh, streaming data, um, in theory that,

380 00:21:09.245 --> 00:21:10.525 that you want to, to search over,

381 00:21:11.145 --> 00:21:13.005 and that gets stored, that gets embedded

382 00:21:13.145 --> 00:21:15.485 and stored in, in, in vu the Vector database.

383 00:21:16.025 --> 00:21:18.565 Um, so I think in, in, in like a future version, um,

384 00:21:18.665 --> 00:21:21.085 or I think it's a feature we're adding, is this sort

385 00:21:21.085 --> 00:21:25.125 of like more dynamic, uh, search the web as as needed.

386 00:21:26.805 --> 00:21:29.185 Um, so then the, the other part,

387 00:21:29.285 --> 00:21:31.185 the main part is this online serving.

388 00:21:32.005 --> 00:21:34.865 So the user will come in with a query,

389 00:21:36.255 --> 00:21:40.105 then we use a large language model, um, in our case, um,

390 00:21:40.225 --> 00:21:43.305 a reasoning model to, to break down the question

391 00:21:43.935 --> 00:21:46.585 into a number of, uh, sub-questions

392 00:21:46.605 --> 00:21:51.055 or subqueries, um, which then, um, uh,

393 00:21:51.615 --> 00:21:54.775 a, a router sort of works out like which, uh, which sort

394 00:21:54.775 --> 00:21:58.695 of like data store to, to fetch relevant entries from, uh,

395 00:21:58.695 --> 00:22:00.535 which we then do from the vector database.

396 00:22:01.955 --> 00:22:03.615 Um, and then I think this is sort of like the, um,

397 00:22:03.765 --> 00:22:05.695 what makes it, uh, you know,

398 00:22:05.695 --> 00:22:08.295 you can call it an agent rather than just plain rag

399 00:22:09.035 --> 00:22:11.095 is it has this reflection step

400 00:22:11.605 --> 00:22:14.135 that decides what to do next.

401 00:22:15.075 --> 00:22:17.055 So the LLM says, uh,

402 00:22:17.075 --> 00:22:19.415 or the, the prompt asks it to answer the question,

403 00:22:20.515 --> 00:22:24.655 are there any gaps in the, um, the, the, the questions

404 00:22:25.005 --> 00:22:27.055 that have been, um, asked

405 00:22:27.075 --> 00:22:31.135 and answered so far using the information from the, um,

406 00:22:31.365 --> 00:22:33.255 from the, uh, data ingestion?

407 00:22:34.075 --> 00:22:38.695 And so if it says, um, yes, there are still gaps,

408 00:22:39.035 --> 00:22:40.815 uh, knowledge gaps to be answered,

409 00:22:41.565 --> 00:22:44.175 then it will generate new queries.

410 00:22:45.365 --> 00:22:47.105 Um, and then just go through the same process

411 00:22:47.325 --> 00:22:51.305 and sort of keep looping this until it's satisfied that, uh,

412 00:22:51.335 --> 00:22:54.905 it's either like exhausted a, uh, like, like a budget

413 00:22:54.965 --> 00:22:58.665 of iterations or tokens or more likely

414 00:22:58.805 --> 00:23:02.545 before then it's exhausted all of the, the questions

415 00:23:02.895 --> 00:23:06.625 that it believes, uh, it needs to answer to, um, to, to sort

416 00:23:06.625 --> 00:23:09.265 of cover the, the query and not have any knowledge gaps.

417 00:23:10.765 --> 00:23:14.825 So, um, this is what makes it, uh, we can call it an agent

418 00:23:15.365 --> 00:23:18.225 rather than just like a, uh, like a plain rag

419 00:23:18.925 --> 00:23:23.865 is the LLM is being used to like to route, um, the,

420 00:23:24.125 --> 00:23:25.145 um, execution.

421 00:23:27.065 --> 00:23:30.605 Um, and, um, I guess we can also, we can think of this, uh,

422 00:23:30.635 --> 00:23:34.005 calling the vector database in response to like

423 00:23:34.635 --> 00:23:36.085 dynamically generated queries.

424 00:23:36.085 --> 00:23:38.685 We can sort of think about that as like a, a form

425 00:23:38.685 --> 00:23:40.605 of tool usage as well.

426 00:23:41.785 --> 00:23:45.725 So we, we've got this, like this, um, conditional execution.

427 00:23:46.575 --> 00:23:49.365 We've got tool usage, um, two sort

428 00:23:49.365 --> 00:23:52.245 of like defining characteristics of, uh, being an agent.

429 00:23:53.985 --> 00:23:57.885 Um, and so then after that, then when it says, okay,

430 00:23:58.415 --> 00:24:02.115 there are no knowledge gaps, then it'll move on

431 00:24:02.115 --> 00:24:04.195 to the final step, which is

432 00:24:04.395 --> 00:24:06.875 to then use the large language model to,

433 00:24:07.015 --> 00:24:10.275 to join those answers from the sub-questions

434 00:24:10.905 --> 00:24:14.115 into like a single coherent, uh, final report.

435 00:24:15.775 --> 00:24:17.195 So, um,

436 00:24:18.335 --> 00:24:23.285 and, uh, uh, one thing I should mention is, so we, uh,

437 00:24:23.285 --> 00:24:28.165 we are using like, um, uh, in, uh, typically like a,

438 00:24:28.485 --> 00:24:32.605 a reasoning LLM for, uh, for these steps,

439 00:24:33.305 --> 00:24:35.485 uh, which I'll explain in a bit more detail, um,

440 00:24:36.065 --> 00:24:38.205 uh, a a few slides on.

441 00:24:38.665 --> 00:24:41.885 Uh, but that's sort of something that's really useful

442 00:24:41.885 --> 00:24:43.685 for like improving the performance of this,

443 00:24:44.065 --> 00:24:45.245 um, this reflection step.

444 00:24:47.375 --> 00:24:50.065 Okay. So let's go across to the, the GitHub

445 00:24:50.165 --> 00:24:51.985 and actually take a look at some code.

446 00:24:55.025 --> 00:24:57.165 And by the way, so, um, don't be shy.

447 00:24:57.165 --> 00:24:59.965 If you have any questions, uh, feel free

448 00:24:59.965 --> 00:25:01.925 to write them in the chat and I'll, I'll stop

449 00:25:01.925 --> 00:25:03.205 and answer them as they come up.

450 00:25:04.545 --> 00:25:08.085 So this is the, the GitHub repository for deep searcher.

451 00:25:09.465 --> 00:25:12.085 And, um, you can see there the, the architectural diagram.

452 00:25:14.205 --> 00:25:16.945 Um, and, um, uh, so this is like a,

453 00:25:16.945 --> 00:25:19.465 like a screenshot from from output.

454 00:25:19.685 --> 00:25:20.985 You can see it's sort of like printing.

455 00:25:21.805 --> 00:25:23.465 Um, so, uh, yeah,

456 00:25:23.465 --> 00:25:26.465 so like printing the inter intermediate steps, um,

457 00:25:26.465 --> 00:25:29.025 like the iterations of breaking it down into subqueries

458 00:25:29.025 --> 00:25:30.705 and answering those, um,

459 00:25:30.845 --> 00:25:32.465 you can see it says accelerated playback.

460 00:25:32.465 --> 00:25:35.265 So that's sort of one of the, uh, I guess like sort of the,

461 00:25:35.325 --> 00:25:39.585 the challenges of research agents currently is, um,

462 00:25:39.585 --> 00:25:43.145 they're very expensive in terms of, um, uh,

463 00:25:43.715 --> 00:25:45.105 foundation model inference.

464 00:25:46.285 --> 00:25:49.465 So, um, the example I'll be showing later on,

465 00:25:49.985 --> 00:25:53.905 I think made something like 75, uh, queries

466 00:25:53.905 --> 00:25:56.105 to a reasoning large language model.

467 00:25:56.885 --> 00:26:00.345 And, um, you know, that that's why it takes like 10 minutes,

468 00:26:00.375 --> 00:26:04.225 half an hour, or potentially longer to actually, uh, to run

469 00:26:04.225 --> 00:26:05.305 through all of the reasoning steps.

470 00:26:06.685 --> 00:26:08.545 Um, I actually hit the, the rate limit for,

471 00:26:08.545 --> 00:26:11.185 for when I was doing it with, um, um, an online service.

472 00:26:11.365 --> 00:26:15.985 So, um, I had to sort of do my 10 queries, wait a minute,

473 00:26:16.605 --> 00:26:17.665 do another 10 queries.

474 00:26:18.285 --> 00:26:22.945 Um, so, uh, uh, I think a key point is like, um, uh,

475 00:26:23.015 --> 00:26:25.145 inference is really like a key bottleneck.

476 00:26:26.855 --> 00:26:31.345 Okay, so we got a question from, um, Anna Ruda, which is

477 00:26:31.645 --> 00:26:36.345 how, how does the LM know when it has got sufficient, uh,

478 00:26:36.345 --> 00:26:38.625 knowledge such that there are no, uh, knowledge gaps?

479 00:26:39.615 --> 00:26:40.785 Yeah, so it's a great question.

480 00:26:41.445 --> 00:26:43.790 Um, I think this is, this is is just sort of like down

481 00:26:43.790 --> 00:26:46.045 to the, um, you know, like the, the,

482 00:26:46.045 --> 00:26:48.165 the magic emergent properties of LLMs.

483 00:26:48.545 --> 00:26:52.965 Uh, these models have been trained on, uh, specifically on,

484 00:26:52.985 --> 00:26:55.485 on reasoning like, uh, multi-step reasoning tasks.

485 00:26:56.385 --> 00:27:01.325 And, um, uh, so, um, yeah, it's,

486 00:27:01.605 --> 00:27:05.945 I mean, it, uh, it is just one of the sort of like the, um,

487 00:27:06.325 --> 00:27:08.265 the, the things that the OM can do,

488 00:27:08.855 --> 00:27:10.465 it's been trained on like so much data

489 00:27:10.645 --> 00:27:14.105 and sort of related tasks in, in, in the, the post training

490 00:27:14.615 --> 00:27:16.665 that it seems to be able to perform this task as well.

491 00:27:19.245 --> 00:27:21.585 Um, so does it need to be a reasoning model?

492 00:27:21.805 --> 00:27:24.505 Um, so, so no, it doesn't need to be a reasoning model.

493 00:27:25.105 --> 00:27:28.545 I think actually it would be advantageous to use some,

494 00:27:28.545 --> 00:27:31.185 some cheaper models for some of the other steps.

495 00:27:31.605 --> 00:27:34.065 So as you mentioned, so breaking down the subtask, uh,

496 00:27:34.225 --> 00:27:37.105 breaking down the query into subqueries, I think, um,

497 00:27:37.105 --> 00:27:41.505 that's something that you could have a much smaller, um, uh,

498 00:27:41.535 --> 00:27:44.465 well, I mean even just like a, a, a more sort

499 00:27:44.465 --> 00:27:46.305 of general chatbot lm, uh,

500 00:27:46.325 --> 00:27:51.185 but ideally a much smaller language model that has been, uh,

501 00:27:51.565 --> 00:27:53.745 has been like fine tuned for that purpose.

502 00:27:55.005 --> 00:27:58.225 So, um, that's sort of, I think like one of the sort of

503 00:27:58.895 --> 00:28:01.425 easy solutions we can get to, like speeding up the inference

504 00:28:01.425 --> 00:28:04.225 of these models is, uh, using sort

505 00:28:04.225 --> 00:28:07.025 of like smaller specialized models for each of the steps.

506 00:28:08.965 --> 00:28:13.545 So great question. Okay, so, um,

507 00:28:14.765 --> 00:28:16.625 so, uh, let's go through the code.

508 00:28:16.645 --> 00:28:20.425 So if you wanna try this at home, um, you can,

509 00:28:20.525 --> 00:28:23.945 you can get clone this repository, install the, uh,

510 00:28:23.945 --> 00:28:27.705 dependencies, and then, uh, copy and paste.

511 00:28:27.805 --> 00:28:32.545 Um, so here's some, uh, here's an example of, uh,

512 00:28:32.685 --> 00:28:35.385 how you actually sort of initiate a call to the,

513 00:28:35.545 --> 00:28:36.545 the deep research agent.

514 00:28:37.565 --> 00:28:41.145 So you can see here we create like a default configuration,

515 00:28:42.415 --> 00:28:44.905 then we put some, we, we override some of the options.

516 00:28:45.605 --> 00:28:49.105 So we tell the, um, the, the, the research agent

517 00:28:49.105 --> 00:28:51.385 that we want to use open AI

518 00:28:51.925 --> 00:28:54.065 as our large language inference service,

519 00:28:54.445 --> 00:28:58.705 and specifically we wanna use GPT-4 oh, um, uh, mini.

520 00:28:59.725 --> 00:29:02.065 Um, and, uh, for the embedding model,

521 00:29:02.315 --> 00:29:04.865 we're also gonna use open AI's embedding service.

522 00:29:05.925 --> 00:29:08.745 So, but, um, uh, deep searcher supports a number

523 00:29:08.745 --> 00:29:12.665 of different, uh, inference and embedding services.

524 00:29:13.485 --> 00:29:15.025 For example, you might want

525 00:29:15.025 --> 00:29:17.305 to use hugging faces sentence transformers

526 00:29:17.305 --> 00:29:18.945 locally for the embedding.

527 00:29:19.645 --> 00:29:22.025 Um, or in my case, you might want

528 00:29:22.025 --> 00:29:24.265 to use like a distilled version of, uh,

529 00:29:24.295 --> 00:29:26.905 deep seek RR one for the large language model.

530 00:29:28.455 --> 00:29:33.385 Okay. So then, um, we, um, ingest the, the,

531 00:29:33.385 --> 00:29:35.265 the data that we wanna search over.

532 00:29:36.085 --> 00:29:38.545 So, um, in this case, we are specifying

533 00:29:38.975 --> 00:29:40.905 that data in advance, uh,

534 00:29:40.945 --> 00:29:43.685 but as we like develop this, it'll be able to like

535 00:29:44.625 --> 00:29:47.405 go off like autonomously and, uh, find relevant sources,

536 00:29:48.385 --> 00:29:50.245 and then we just need to call this, uh,

537 00:29:50.245 --> 00:29:51.445 this query function here.

538 00:29:52.625 --> 00:29:56.005 Um, so, uh, one way, like I sort of like to sort of work out

539 00:29:56.005 --> 00:30:00.805 how code is working is literally just step through, um,

540 00:30:00.835 --> 00:30:02.125 just step through the functions.

541 00:30:02.265 --> 00:30:05.725 So, um, uh, so, you know, like I,

542 00:30:06.085 --> 00:30:08.245 I put this into a document, I ran it.

543 00:30:08.785 --> 00:30:11.525 Um, unfortunately I can't really do a live demo

544 00:30:11.525 --> 00:30:16.205 because, um, it, it would take like, say 10 plus minutes to,

545 00:30:16.385 --> 00:30:18.445 to give back a, um, report.

546 00:30:19.505 --> 00:30:22.205 Um, but imagine we've done that, so we know that it works.

547 00:30:22.205 --> 00:30:24.925 We can say, okay, let's actually, let's sort of do a, um,

548 00:30:24.925 --> 00:30:26.325 like a step through debugging

549 00:30:26.825 --> 00:30:28.925 and look inside each of these functions

550 00:30:28.925 --> 00:30:31.325 and work out how it's actually doing what it's doing.

551 00:30:32.065 --> 00:30:36.805 So we've got here query from online query, okay,

552 00:30:36.985 --> 00:30:39.845 so then, um, I use VS code,

553 00:30:39.845 --> 00:30:41.365 you can just sort of jump to definition.

554 00:30:42.545 --> 00:30:46.005 So then we see that it's calling, um,

555 00:30:46.195 --> 00:30:49.125 this configuration dot default searcher,

556 00:30:50.465 --> 00:30:53.165 and it's calling default searcher query on that.

557 00:30:54.915 --> 00:30:57.425 So what is default search?

558 00:30:57.615 --> 00:30:59.585 Well, we'll have to go to the configuration

559 00:30:59.605 --> 00:31:02.265 and see what the, um, is setting it to there.

560 00:31:03.805 --> 00:31:05.105 Oh, and so just before we go on,

561 00:31:05.105 --> 00:31:08.655 so we've got another question from, uh,

562 00:31:15.155 --> 00:31:16.515 actually no, sorry, I think I've already answered that.

563 00:31:16.515 --> 00:31:19.635 That's from, uh, from Anna about how does it know

564 00:31:19.635 --> 00:31:20.795 that there are, are knowledge gaps?

565 00:31:22.575 --> 00:31:27.555 Um, and yeah, so, so that's just like, like I said, um, the,

566 00:31:27.555 --> 00:31:31.355 uh, the, these foundation models, um, especially

567 00:31:31.355 --> 00:31:33.035 after they've been sort of like post trained for,

568 00:31:33.055 --> 00:31:36.795 for different tasks like chat, uh, reasoning evaluation,

569 00:31:37.385 --> 00:31:41.315 have like massive sort of transfer learning to unseen tasks.

570 00:31:42.055 --> 00:31:46.435 And so, um, I'm not sure exactly whether this task of

571 00:31:47.035 --> 00:31:48.075 identifying, uh,

572 00:31:48.145 --> 00:31:50.315 knowledge gaps was in the training somewhere.

573 00:31:50.775 --> 00:31:53.275 Uh, but it's, you know, just like the power of like scale

574 00:31:53.335 --> 00:31:56.555 and transfer learning that it can do tasks like this.

575 00:31:58.465 --> 00:32:00.915 Okay, so we see that, uh,

576 00:32:00.985 --> 00:32:05.675 when the configuration is initialized, the default searcher,

577 00:32:05.905 --> 00:32:10.795 it's, it's, um, creating this, this rag router option,

578 00:32:12.395 --> 00:32:16.975 and then it creates, um, so we've got two agents in here.

579 00:32:18.255 --> 00:32:22.215 I think it's gonna sort of like, uh, you know, um, uh,

580 00:32:23.235 --> 00:32:25.975 uh, work out, which onto route to, but we'll, we'll look in.

581 00:32:25.995 --> 00:32:28.135 So, so chain of rag is like a different technique.

582 00:32:28.755 --> 00:32:32.145 Um, if you're interested, you can, you can check out this,

583 00:32:32.255 --> 00:32:35.065 this research paper that describes it in a bit more detail.

584 00:32:35.845 --> 00:32:40.005 Um, but we we're gonna look inside the, the deep search,

585 00:32:40.945 --> 00:32:43.285 um, uh, object here.

586 00:32:44.145 --> 00:32:47.645 And so it seems like this object is gonna contain the logic

587 00:32:48.305 --> 00:32:52.965 to perform our, excuse me, to perform this, um,

588 00:32:52.965 --> 00:32:55.325 like this architecture of, uh, research agent.

589 00:32:57.545 --> 00:33:00.005 So let's have a look into deep search now.

590 00:33:02.105 --> 00:33:04.805 So I've gone across to the deep search, the, the file

591 00:33:04.805 --> 00:33:06.885 that contains, uh, deep search.

592 00:33:07.265 --> 00:33:09.045 And by the way, is, is this actually,

593 00:33:09.065 --> 00:33:10.245 is this large enough for everyone?

594 00:33:10.275 --> 00:33:12.565 I'll just, um, see if I can make the font a bit bigger.

595 00:33:14.895 --> 00:33:19.625 Okay. Um, so, uh, here is the definition

596 00:33:19.725 --> 00:33:22.065 of this deep search, um, object.

597 00:33:23.415 --> 00:33:26.915 Um, you can see it sort of takes a number of parameters to,

598 00:33:27.255 --> 00:33:30.995 um, uh, to, to store in the object.

599 00:33:30.995 --> 00:33:34.795 So it takes like a base LLM, it takes an embedding model,

600 00:33:35.775 --> 00:33:39.195 it takes a vector db, um,

601 00:33:40.195 --> 00:33:41.475 a max number of iterations.

602 00:33:41.735 --> 00:33:44.835 So that's gonna be like a, like a limit on the number

603 00:33:44.835 --> 00:33:47.315 of these reflection cycles that we can do,

604 00:33:51.305 --> 00:33:55.615 um, as well as some other sort

605 00:33:55.815 --> 00:33:57.935 of like settings that, um, a bit more sort

606 00:33:57.935 --> 00:33:58.975 of, um, miscellaneous.

607 00:33:58.995 --> 00:34:02.415 So I'll, I'll jump over them. Um, so, um,

608 00:34:03.715 --> 00:34:05.095 now we know like what the object is

609 00:34:05.095 --> 00:34:06.615 that is actually performing the query.

610 00:34:07.235 --> 00:34:08.255 So I'll just go back here.

611 00:34:08.275 --> 00:34:13.105 You can see that this method here is gonna call, uh,

612 00:34:13.175 --> 00:34:14.345 deep search dot query.

613 00:34:14.525 --> 00:34:15.865 So what does that do?

614 00:34:15.895 --> 00:34:17.705 Well, let's go and have a look at, at the code.

615 00:34:22.655 --> 00:34:24.505 Okay, so here we are here.

616 00:34:24.885 --> 00:34:26.705 And, um, so, so just to, to reiterate.

617 00:34:26.725 --> 00:34:29.705 So, um, I would actually do this with a step

618 00:34:29.705 --> 00:34:34.225 through debugger in, uh, vs code, just so like doing,

619 00:34:34.345 --> 00:34:37.385 you know, step through, stepping into, stepping into, um,

620 00:34:37.805 --> 00:34:40.185 to sort of like follow the, the path of execution.

621 00:34:40.965 --> 00:34:43.105 And, um, that just, it's like a really good way

622 00:34:43.105 --> 00:34:47.345 to understand, which is like the relevant code to sort of,

623 00:34:47.345 --> 00:34:50.665 um, uh, like what, uh, what code is doing, what,

624 00:34:51.275 --> 00:34:53.025 where is like the most relevant code

625 00:34:53.025 --> 00:34:54.385 to understand what's going on.

626 00:34:55.285 --> 00:34:59.545 Um, and also, um, uh, so, um, uh,

627 00:35:00.165 --> 00:35:01.865 not everyone's familiar with this, but I think a really

628 00:35:01.865 --> 00:35:03.465 helpful tool for, for debugging

629 00:35:03.465 --> 00:35:08.265 and understanding code is the, uh, the debug, um, uh,

630 00:35:08.265 --> 00:35:12.185 console in, in vs code or whatever IDE is.

631 00:35:12.765 --> 00:35:15.145 So when you've stopped execution in your step three

632 00:35:15.145 --> 00:35:17.945 debugging, you can actually then just type in expressions

633 00:35:18.615 --> 00:35:19.945 into the, the debug terminal.

634 00:35:20.845 --> 00:35:23.025 So you can type in, you know, like,

635 00:35:23.025 --> 00:35:24.545 what is the shape of this tensor?

636 00:35:25.335 --> 00:35:30.265 What is, um, the value of this flag, um, is,

637 00:35:30.505 --> 00:35:31.745 does some condition hold?

638 00:35:31.885 --> 00:35:33.065 And that's a really useful way

639 00:35:33.065 --> 00:35:35.985 for like interrogating the program, um, as it's running

640 00:35:36.325 --> 00:35:37.685 to understand what's going on.

641 00:35:39.025 --> 00:35:41.405 But, um, let's look at, let's look at this query function.

642 00:35:42.265 --> 00:35:44.925 So we can see that the first thing it does is it calls

643 00:35:46.595 --> 00:35:47.805 self retrieve.

644 00:35:48.675 --> 00:35:51.965 Okay? So I think it we're sort of like untangling the, the,

645 00:35:51.985 --> 00:35:56.765 um, uh, you know, like the, the execution flow of of, of

646 00:35:56.765 --> 00:35:58.285 where the actual agent's happening.

647 00:35:59.145 --> 00:36:00.525 Uh, I think we're getting a bit closer.

648 00:36:00.745 --> 00:36:02.845 So now let's look at the self retrieve function.

649 00:36:06.205 --> 00:36:07.065 So where is it?

650 00:36:11.015 --> 00:36:15.875 Oh yeah, here we go. So self retrieve that call, that calls,

651 00:36:16.255 --> 00:36:19.275 um, the self dot asynchronous retrieve.

652 00:36:20.095 --> 00:36:22.435 Um, so these, we are just sort of like hacking

653 00:36:22.435 --> 00:36:24.675 through the layers of indirection to get

654 00:36:24.675 --> 00:36:25.755 to the, the core of it.

655 00:36:27.335 --> 00:36:30.715 And so, um, it's gonna call this, it's gonna run this, um,

656 00:36:31.575 --> 00:36:33.555 uh, this function asynchronously.

657 00:36:34.335 --> 00:36:36.635 Um, and now it looks like we've actually gotten

658 00:36:36.695 --> 00:36:39.315 to like the core logic of how the, um,

659 00:36:39.655 --> 00:36:41.915 the research, um, agent works.

660 00:36:43.215 --> 00:36:47.945 Okay? So we start off by, um, just like

661 00:36:48.895 --> 00:36:50.185 setting a variable

662 00:36:50.255 --> 00:36:52.425 that contains the maximum number of iterations.

663 00:36:54.565 --> 00:36:59.425 And, um, so this comment here, um, indicates that,

664 00:37:00.365 --> 00:37:03.185 um, the first thing we'll do is we'll

665 00:37:03.715 --> 00:37:05.865 break down the query into subqueries

666 00:37:06.565 --> 00:37:08.265 by prompting the large language model.

667 00:37:10.125 --> 00:37:13.425 So that's this sort of like, jump from the useless query

668 00:37:13.485 --> 00:37:15.985 to the first, uh, set of subqueries.

669 00:37:19.115 --> 00:37:22.015 And we've got a, a generate subqueries, uh, method here.

670 00:37:22.235 --> 00:37:26.375 Um, now that we've, like, we've sort of, we've reached, um,

671 00:37:26.525 --> 00:37:28.975 sort of like the core logic, um, I won't sort

672 00:37:28.975 --> 00:37:30.615 of jump down further until we've gone

673 00:37:30.615 --> 00:37:32.215 through this entire loop and we can see like

674 00:37:32.375 --> 00:37:36.855 what exactly are the prompts that, um, uh, uh,

675 00:37:37.045 --> 00:37:39.375 what are the prompts that the LLM is being prompted with

676 00:37:39.835 --> 00:37:43.295 to do the different tasks like generating the subqueries,

677 00:37:43.475 --> 00:37:45.815 um, working out where there's knowledge gaps and so on.

678 00:37:47.365 --> 00:37:50.975 Okay, so, um, here we've got the log color print.

679 00:37:50.975 --> 00:37:52.455 So this is what you'll see in the terminal

680 00:37:52.525 --> 00:37:53.815 when it's performing this step.

681 00:37:55.155 --> 00:37:58.655 Um, and then it takes the, the list of current subqueries

682 00:37:58.655 --> 00:38:00.975 and then just adds those ones to it.

683 00:38:03.695 --> 00:38:05.075 And so now we have a loop.

684 00:38:05.215 --> 00:38:08.635 So now we've sort of entered this main, uh, logic loop,

685 00:38:09.545 --> 00:38:11.915 this main, um, uh,

686 00:38:12.175 --> 00:38:13.515 I'm not sure if you can see my mouse pointed,

687 00:38:13.535 --> 00:38:17.355 but I'm sort of like circling around this like inner loop in

688 00:38:17.355 --> 00:38:19.835 the online serving, um, area of,

689 00:38:20.135 --> 00:38:21.955 of the, the architect diagram.

690 00:38:24.175 --> 00:38:28.955 Um, so the first, the first step is to, um,

691 00:38:29.215 --> 00:38:32.395 so to, to, um, uh, search

692 00:38:32.415 --> 00:38:34.875 for relevant chunks from the vector database,

693 00:38:35.535 --> 00:38:40.405 given the query and the, um, the, the, the subqueries.

694 00:38:41.585 --> 00:38:45.085 So this is actually gonna return, um, some like,

695 00:38:45.085 --> 00:38:46.125 asynchronous tasks.

696 00:38:47.065 --> 00:38:52.045 So, um, uh, so that, that's this sort of like step of,

697 00:38:52.585 --> 00:38:55.325 of calling the, um, the, the vector database here

698 00:38:55.475 --> 00:38:57.405 with those, those queries and subqueries.

699 00:38:59.365 --> 00:39:02.305 Um, so tho those just return tasks, um,

700 00:39:02.305 --> 00:39:06.145 they don't get executed until we call this awai,

701 00:39:06.685 --> 00:39:08.065 um, async io gather.

702 00:39:08.765 --> 00:39:12.265 And that actually executes the, the tasks in parallel

703 00:39:12.405 --> 00:39:15.905 and then waits for the, the final one to, to finish

704 00:39:16.485 --> 00:39:17.905 before setting search results.

705 00:39:19.575 --> 00:39:22.185 Okay? So then for, um, uh,

706 00:39:22.445 --> 00:39:24.585 we take these results from the subqueries

707 00:39:25.125 --> 00:39:26.145 and then we merge them,

708 00:39:26.685 --> 00:39:30.945 and we're also keeping track of how many consume tokens, uh,

709 00:39:31.205 --> 00:39:34.145 we have because, um, you know, we wanna sort of be able to,

710 00:39:34.845 --> 00:39:37.905 to calculate the, the cost of this afterwards.

711 00:39:38.685 --> 00:39:40.905 Um, also we might wanna set like a,

712 00:39:41.495 --> 00:39:45.985 like a hard limit on like a, like a token budget, uh,

713 00:39:46.195 --> 00:39:48.185 token token budget I should say.

714 00:39:49.965 --> 00:39:52.625 Um, and, um,

715 00:39:55.285 --> 00:39:58.135 okay, so then we sort of, um, yeah, then we take, um,

716 00:39:58.155 --> 00:40:01.695 the search results, uh, put them into this, this list of,

717 00:40:01.835 --> 00:40:03.575 uh, search results from Vector db.

718 00:40:05.595 --> 00:40:10.455 And, um, we, so, um, uh, I think in many cases

719 00:40:11.875 --> 00:40:13.615 we are going to have, uh,

720 00:40:13.645 --> 00:40:16.455 duplicate chunks returned from the vector database.

721 00:40:17.635 --> 00:40:21.575 So, um, a good step is just to like, to deduplicate those so

722 00:40:21.575 --> 00:40:23.535 that we have a list of like unique chunks

723 00:40:24.085 --> 00:40:26.455 fetched from the vector database, uh,

724 00:40:26.805 --> 00:40:28.415 from those subquery queries.

725 00:40:31.435 --> 00:40:33.775 Um, so this is where we break if we've,

726 00:40:33.955 --> 00:40:36.215 if we've reached the maximum number of iterations.

727 00:40:37.755 --> 00:40:42.015 Uh, but then the next step is the performing the reflection

728 00:40:42.555 --> 00:40:45.615 and getting, uh, additional, um, queries that can

729 00:40:46.145 --> 00:40:47.815 cover any, like knowledge gaps.

730 00:40:49.055 --> 00:40:52.635 So if we go back here now, you can see where, so we,

731 00:40:52.635 --> 00:40:56.195 we've sort of gone through this loop, now we're in this, um,

732 00:40:56.665 --> 00:40:59.035 this, uh, yellow orange diamond,

733 00:40:59.995 --> 00:41:02.895 and we're performing this, uh, this reflection step.

734 00:41:03.835 --> 00:41:05.495 So this is where the LLM

735 00:41:05.835 --> 00:41:07.775 or the reasoning model is going

736 00:41:07.775 --> 00:41:10.295 to actually control the execution.

737 00:41:14.655 --> 00:41:18.275 And, um, so, uh, then, um, we just perform.

738 00:41:18.375 --> 00:41:21.115 And so we, uh, we prompt the l the reasoning model

739 00:41:21.145 --> 00:41:25.155 with another prompt to generate the, the gap queries.

740 00:41:26.855 --> 00:41:29.155 And, um, so this is gonna generate, so if, if it,

741 00:41:29.175 --> 00:41:33.355 if the model believes that there are additional queries

742 00:41:33.355 --> 00:41:35.315 that need to be answered to sort

743 00:41:35.315 --> 00:41:38.275 of fill in any knowledge gaps, that it will return them in,

744 00:41:38.375 --> 00:41:40.355 in the sub, uh, sub gapp queries.

745 00:41:41.415 --> 00:41:45.595 Um, so we know that if, if that's empty, then we know

746 00:41:45.595 --> 00:41:46.835 that we can terminate that loop

747 00:41:48.695 --> 00:41:51.325 and then go onto the generating the final report.

748 00:41:53.705 --> 00:41:58.165 Uh, but otherwise we then just add those new sub subqueries

749 00:41:58.665 --> 00:41:59.685 to the subqueries.

750 00:41:59.685 --> 00:42:01.885 So this is sort of like a stack of, of,

751 00:42:02.065 --> 00:42:03.285 uh, queries to answer.

752 00:42:03.945 --> 00:42:05.565 We then add them to that list

753 00:42:06.185 --> 00:42:09.485 and then repeat this, um, iteration.

754 00:42:11.185 --> 00:42:14.005 So we then just do like another loop around here.

755 00:42:15.945 --> 00:42:18.285 Um, so, and you know, that's essentially it.

756 00:42:18.345 --> 00:42:21.165 So, um, uh, if we've got time, I'll just sort

757 00:42:21.165 --> 00:42:23.765 of briefly sort of look into these, these functions

758 00:42:23.765 --> 00:42:25.005 that actually define the prompts

759 00:42:25.585 --> 00:42:28.725 and, um, uh, hold your horses as such.

760 00:42:28.845 --> 00:42:30.525 I, I, I've just got a few more minutes

761 00:42:30.585 --> 00:42:32.485 and then, um, I'll, I'll give some conclusions

762 00:42:33.305 --> 00:42:35.005 and then we'll, we'll leave, um, five,

763 00:42:35.025 --> 00:42:36.485 10 minutes at the end for any questions.

764 00:42:37.185 --> 00:42:40.245 Um, so actually I'll, uh, what I'll say is I'll, I'll, uh,

765 00:42:40.245 --> 00:42:43.045 leave this for, for your sort of, uh, you know, personal,

766 00:42:43.545 --> 00:42:46.285 um, enjoyment, uh, uh, education.

767 00:42:46.305 --> 00:42:47.885 So you can actually look inside these,

768 00:42:47.885 --> 00:42:50.605 these methods really easily, um,

769 00:42:50.945 --> 00:42:54.605 and find out like, what, how have we actually, um, sort

770 00:42:54.605 --> 00:42:58.565 of like formatted the prompt to perform these tasks and,

771 00:42:58.585 --> 00:42:59.685 and to do that successfully.

772 00:43:00.465 --> 00:43:01.965 So you can, I think it's always like, good

773 00:43:01.965 --> 00:43:05.365 to actually like read the prompt to understand like, what is

774 00:43:06.185 --> 00:43:07.525 the, the model, like

775 00:43:07.525 --> 00:43:09.285 what exactly is the model being instructed to do?

776 00:43:10.105 --> 00:43:11.285 So encourage you to like, look

777 00:43:11.285 --> 00:43:14.045 inside these generate gap queries, um,

778 00:43:14.105 --> 00:43:16.645 search chunks from vector, uh,

779 00:43:16.925 --> 00:43:18.165 generate subqueries, you know, et cetera.

780 00:43:18.905 --> 00:43:22.685 Um, so, um, uh, very quickly, so

781 00:43:22.685 --> 00:43:25.725 after it's done that it terminates, it then returns

782 00:43:25.745 --> 00:43:26.805 to this query function,

783 00:43:27.545 --> 00:43:31.205 and then now we're in these, uh, two steps here of

784 00:43:31.725 --> 00:43:36.125 synthesizing the report from all of these, um, subqueries

785 00:43:36.345 --> 00:43:37.965 and retrieve trunks.

786 00:43:39.385 --> 00:43:41.005 Um, and, you know, that's just like more

787 00:43:41.005 --> 00:43:42.205 prompting of the same model.

788 00:43:43.375 --> 00:43:47.555 And, uh, when you've done that, then, um, uh, so, you know,

789 00:43:47.555 --> 00:43:48.795 it may, may take like 10 minutes,

790 00:43:49.805 --> 00:43:53.025 30 minutes depending on like what inference service you use.

791 00:43:53.025 --> 00:43:56.225 And the question, it will have done like multiple iterations

792 00:43:56.405 --> 00:43:58.305 of this, um, like, you know, this reasoning

793 00:43:59.065 --> 00:44:01.865 breaking down the, the, the question into a number of like,

794 00:44:01.995 --> 00:44:05.625 steps to answer it, uh, working out, like whether it's,

795 00:44:05.765 --> 00:44:07.425 it should keep going or, or finish.

796 00:44:07.965 --> 00:44:09.865 And it generates a nice little report.

797 00:44:10.525 --> 00:44:12.145 And, um, I've got an example here.

798 00:44:12.765 --> 00:44:17.235 Um, so the, the question was how has the, the Simpsons,

799 00:44:17.455 --> 00:44:18.595 uh, evolved over time?

800 00:44:19.735 --> 00:44:22.155 And it's put together this nice little report sort

801 00:44:22.155 --> 00:44:27.075 of really like covering all bases, um, a nice sort

802 00:44:27.075 --> 00:44:28.475 of like conclusion tying things together.

803 00:44:29.575 --> 00:44:33.285 So, um, let's go back to the, um, the slides

804 00:44:33.585 --> 00:44:34.885 and we'll wrap things up.

805 00:44:38.505 --> 00:44:42.115 So could I give a rough overview of the prompts?

806 00:44:42.855 --> 00:44:45.915 Um, I think just for time, um, uh,

807 00:44:51.605 --> 00:44:52.455 yeah, let's have a look.

808 00:45:01.515 --> 00:45:04.045 Yeah, so, so, so for time, uh, I'm just gonna have, sorry,

809 00:45:04.385 --> 00:45:06.405 um, I think I'm gonna have to skip like looking at the

810 00:45:06.405 --> 00:45:08.205 exact, um, code.

811 00:45:08.665 --> 00:45:13.395 Uh, but let, let me just point you to, um, geez,

812 00:45:13.415 --> 00:45:15.635 now we bit lost my place.

813 00:45:22.065 --> 00:45:24.235 Okay. Yeah, so, so this is also in deep search,

814 00:45:24.375 --> 00:45:28.955 and we've got Subquery prompt And you can see, so yeah, so,

815 00:45:28.955 --> 00:45:31.555 so these prompts are actually in the deep search do pi file.

816 00:45:32.215 --> 00:45:35.035 So for example, you're an AI content analysis expert,

817 00:45:35.035 --> 00:45:37.475 good summarizing content, please summarize,

818 00:45:37.655 --> 00:45:38.715 you know, dah, dah, dah, dah.

819 00:45:39.625 --> 00:45:40.795 Then there's a refre, uh,

820 00:45:40.835 --> 00:45:43.875 a reflection prompt determine whether additional search

821 00:45:44.035 --> 00:45:46.355 queries are needed based on the original query, et cetera.

822 00:45:47.135 --> 00:45:48.635 Um, there's a re-ranking prompt

823 00:45:49.815 --> 00:45:51.235 and there's a subquery prompt.

824 00:45:51.535 --> 00:45:52.995 So you can see we've got, um,

825 00:45:53.095 --> 00:45:56.195 at least like four different types of, uh, uh, prompting

826 00:45:56.195 --> 00:45:58.115 for different, uh, subtasks.

827 00:45:59.015 --> 00:46:00.595 So, um, but, but check out this file

828 00:46:00.615 --> 00:46:02.675 and you can, you can sort of like, uh,

829 00:46:03.285 --> 00:46:04.595 check them out in some more detail.

830 00:46:05.135 --> 00:46:07.995 Uh, but going back to the slides, let's see.

831 00:46:15.785 --> 00:46:20.555 Okay, so, um, Uh, so what's sort

832 00:46:20.555 --> 00:46:23.835 of like some of the secret source, um, behind how,

833 00:46:23.835 --> 00:46:25.035 how these ations work?

834 00:46:25.975 --> 00:46:28.075 Uh, well, I think one thing is this idea

835 00:46:28.095 --> 00:46:29.555 of conditional computation.

836 00:46:30.095 --> 00:46:33.315 And so that means that the model can actually decide

837 00:46:33.695 --> 00:46:37.595 how much computation to do based on the current, sort

838 00:46:37.595 --> 00:46:40.195 of like, status of, um, the model output.

839 00:46:40.195 --> 00:46:42.275 And so this can be done in, in a number of different ways.

840 00:46:43.135 --> 00:46:46.435 Um, so one sort of, uh, you know,

841 00:46:46.435 --> 00:46:49.595 more complex way is you could actually introduce, uh, uh,

842 00:46:49.595 --> 00:46:53.555 like reasoning tokens that sort of tell the model to sort

843 00:46:53.555 --> 00:46:56.515 of like keep, you know, generating intermediate output,

844 00:46:56.845 --> 00:46:58.155 doing additional computations

845 00:46:58.155 --> 00:46:59.675 until some termination condition.

846 00:47:00.095 --> 00:47:02.515 Uh, apparently deep seek doesn't use this method.

847 00:47:03.175 --> 00:47:05.915 Um, but you know, this is like a good strategy to,

848 00:47:05.975 --> 00:47:07.275 to do conditional computation.

849 00:47:08.695 --> 00:47:10.635 Um, I think the second one is this

850 00:47:11.115 --> 00:47:12.195 reinforcement learning training.

851 00:47:12.415 --> 00:47:15.195 So, um, really simple just taking, uh,

852 00:47:15.195 --> 00:47:18.595 conceptually taking a strong base model, um,

853 00:47:18.815 --> 00:47:19.995 so like deep seek did

854 00:47:20.615 --> 00:47:23.195 and then applying, um, a form

855 00:47:23.195 --> 00:47:24.835 of reinforcement learning on

856 00:47:24.835 --> 00:47:26.035 very high quality reasoning data.

857 00:47:26.455 --> 00:47:28.395 And then there's sort of like this aha moment where

858 00:47:29.025 --> 00:47:30.965 the model just like starts to learn how to reason.

859 00:47:34.025 --> 00:47:38.645 Um, so, um, uh, it's not like all sort of like, um, uh,

860 00:47:38.645 --> 00:47:39.685 rainbows and sunshine.

861 00:47:40.195 --> 00:47:41.325 They're of course, like some,

862 00:47:41.485 --> 00:47:42.925 I think some quite major challenges with these.

863 00:47:42.925 --> 00:47:45.125 And I think the first one is just the cost.

864 00:47:45.785 --> 00:47:49.605 So if you're using Open AI's deep research agent, um,

865 00:47:49.625 --> 00:47:51.205 you'll have to have their pro subscription,

866 00:47:51.245 --> 00:47:52.605 I think it's like $200 a month,

867 00:47:53.325 --> 00:47:56.725 I believe they're still not covering the, the, the cost

868 00:47:56.725 --> 00:47:58.845 of like inference, uh, by charging that.

869 00:47:59.625 --> 00:48:01.325 Um, and so one thing I discovered, sort

870 00:48:01.325 --> 00:48:02.925 of like actually running these queries is

871 00:48:03.785 --> 00:48:05.725 how much inference they actually require,

872 00:48:05.725 --> 00:48:09.005 because firstly, these reasoning models, um,

873 00:48:09.075 --> 00:48:12.605 just typically use a lot more, uh, inference, um, you know,

874 00:48:12.605 --> 00:48:14.125 going through their like, number of reasoning steps

875 00:48:14.585 --> 00:48:16.045 for a given prompt, uh,

876 00:48:16.065 --> 00:48:19.045 but also the fact that it'll need to do multiple calls

877 00:48:19.065 --> 00:48:20.325 of these, um,

878 00:48:20.875 --> 00:48:23.565 many more calls than like a simple, uh, rack system.

879 00:48:25.345 --> 00:48:29.445 Um, again, just like hallucinations, like whole, um, uh,

880 00:48:29.445 --> 00:48:32.085 foundation models like a general problem, um,

881 00:48:32.115 --> 00:48:33.525 they can be reasoning errors.

882 00:48:33.745 --> 00:48:37.805 So in its intermediate, uh, reasoning chain trace it,

883 00:48:37.985 --> 00:48:40.325 if there's some sort of like incorrect, um, step,

884 00:48:40.955 --> 00:48:43.045 then all the following steps could fail.

885 00:48:43.945 --> 00:48:47.875 Um, and then I think finally, uh, to actually sort

886 00:48:47.875 --> 00:48:50.275 of train the, these reasoning models, we need

887 00:48:50.275 --> 00:48:52.395 to have really high quality, uh,

888 00:48:52.865 --> 00:48:54.915 open source reasoning, trace data sets.

889 00:48:55.455 --> 00:48:57.715 And so that's something that people are working on so

890 00:48:57.715 --> 00:49:01.475 that we can, uh, reproduce some of these results from,

891 00:49:01.475 --> 00:49:04.115 from open AI and, um, and its competitors.

892 00:49:05.745 --> 00:49:08.595 Okay. So I can tell, uh, ACHI is, um,

893 00:49:08.595 --> 00:49:10.155 hurrying me along so, so very, very quickly.

894 00:49:10.255 --> 00:49:13.155 So, uh, in terms of the cost, some sort

895 00:49:13.155 --> 00:49:14.835 of solutions people working on for

896 00:49:14.835 --> 00:49:17.155 that is specialized hardware, uh,

897 00:49:17.155 --> 00:49:18.955 but also other types of reasoning.

898 00:49:18.975 --> 00:49:23.115 So there's an idea called continuous chain of thought that,

899 00:49:23.615 --> 00:49:25.875 um, doesn't sort of like use discrete tokens to reason,

900 00:49:25.935 --> 00:49:28.555 but actually uses like a, a continuous latent variable,

901 00:49:28.685 --> 00:49:32.035 which is a lot, um, uh, um, more cost effective.

902 00:49:32.735 --> 00:49:35.555 Um, and then like these barriers to entry both

903 00:49:35.555 --> 00:49:37.155 with like the open source software

904 00:49:37.855 --> 00:49:39.915 and the, you know, like the data and the models.

905 00:49:40.695 --> 00:49:42.875 So, um, players like Ziot

906 00:49:42.895 --> 00:49:46.715 and hugging Face, uh, we are working on, uh,

907 00:49:46.955 --> 00:49:48.955 reproducing these results fully open source,

908 00:49:49.455 --> 00:49:50.915 so then new folks can just, you know,

909 00:49:50.915 --> 00:49:53.395 take away the learnings with, uh, systems that work

910 00:49:53.415 --> 00:49:55.875 and, um, build really easily build

911 00:49:55.875 --> 00:49:57.195 successful research agents.

912 00:49:58.535 --> 00:50:00.315 So, um, I think that's it from me

913 00:50:00.535 --> 00:50:03.635 and, um, uh, it looks like I went a bit over time,

914 00:50:03.635 --> 00:50:05.795 but I think we've got five minutes left for questions.

915 00:50:06.585 --> 00:50:08.435 Yeah. So you have a few questions,

916 00:50:08.495 --> 00:50:10.115 so let's just, uh, go through them.

917 00:50:10.935 --> 00:50:13.635 How, uh, how does the vector embedding work

918 00:50:13.635 --> 00:50:17.155 for images are vector vectors created for pixels

919 00:50:17.155 --> 00:50:18.195 or image portions?

920 00:50:21.835 --> 00:50:25.925 Yeah. Okay. So, um, Um,

921 00:50:26.805 --> 00:50:29.025 I, yeah.

922 00:50:29.025 --> 00:50:33.265 Okay. So, um, so how, so, uh, this sort of like framework

923 00:50:33.265 --> 00:50:35.545 that I presented is like very sort of, uh, general,

924 00:50:36.165 --> 00:50:39.185 all you need is some concept of embedding

925 00:50:39.565 --> 00:50:41.585 and some sort of like foundation model.

926 00:50:42.205 --> 00:50:46.585 Um, so typically, um, uh, so, um, uh,

927 00:50:47.555 --> 00:50:49.345 there are like really good open source models

928 00:50:49.575 --> 00:50:51.705 that can perform embedding of images,

929 00:50:52.485 --> 00:50:54.625 and they typically work on the whole image.

930 00:50:55.205 --> 00:50:58.665 Um, they might be looking at sort of like, uh, patches put

931 00:50:58.665 --> 00:51:00.225 that into like a vision transformer

932 00:51:00.685 --> 00:51:02.865 and then output like an embedding for the entire image.

933 00:51:03.805 --> 00:51:06.545 And we can, we can, there's models that will allow you

934 00:51:06.545 --> 00:51:10.305 to sort of like embed images into the same space as as text.

935 00:51:11.005 --> 00:51:14.385 So, um, uh, all of the same sort of concepts apply here.

936 00:51:14.805 --> 00:51:17.025 You just use a different embedding model that's specific

937 00:51:17.045 --> 00:51:18.545 for images or images and text.

938 00:51:21.405 --> 00:51:24.145 Um, so in the iterate search reasoning cycle

939 00:51:25.085 --> 00:51:27.305 is the iterated reasoning being performed

940 00:51:27.305 --> 00:51:29.705 by the reasoning LLM and search by the vector db.

941 00:51:30.325 --> 00:51:33.505 Uh, so the, um, so the LLM, it can never actually, it,

942 00:51:33.525 --> 00:51:36.105 it never actually performs an action, it just sort

943 00:51:36.105 --> 00:51:38.465 of gives the instruction to perform an action.

944 00:51:39.045 --> 00:51:43.105 So it will say, okay, search the vector database for,

945 00:51:43.165 --> 00:51:45.545 you know, this, and then the code will actually

946 00:51:45.545 --> 00:51:46.785 perform that, that search.

947 00:51:47.165 --> 00:51:50.185 But yeah, it's, it's the vector database that is performing

948 00:51:50.185 --> 00:51:51.305 that similarity search.

949 00:51:51.845 --> 00:51:54.825 The, the LM just sort of like requests an instance

950 00:51:54.845 --> 00:51:55.945 of, uh, tool usage.

951 00:51:57.205 --> 00:51:58.745 Um, doesn't need to be a reasoning model.

952 00:51:58.765 --> 00:51:59.785 So yeah, I think we've covered this.

953 00:51:59.885 --> 00:52:04.585 Um, so no, um, I think it's actually probably good if many

954 00:52:04.585 --> 00:52:06.425 of the other steps are, are not reasoning models

955 00:52:06.425 --> 00:52:10.185 because you can reduce the cost of, um, of, of inference

956 00:52:10.185 --> 00:52:14.945 to rank the system, um, is semantic search.

957 00:52:15.245 --> 00:52:19.685 Um, so, um,

958 00:52:20.595 --> 00:52:24.365 does, so, yeah, so, so Melva has support for,

959 00:52:24.625 --> 00:52:26.645 for hybrid, um, search.

960 00:52:27.185 --> 00:52:31.805 So lexical plus semantic search, uh, with Melva 2.5,

961 00:52:32.305 --> 00:52:33.325 you could implement that

962 00:52:33.865 --> 00:52:35.405 and that would, you know, just sort

963 00:52:35.405 --> 00:52:39.685 of be like a very small modification to that, um, uh,

964 00:52:39.685 --> 00:52:42.725 vector database lookup step in in the research agent.

965 00:52:44.345 --> 00:52:47.245 Can we choose the subquery number? Peram?

966 00:52:47.505 --> 00:52:49.965 Um, so I think this is one of those things

967 00:52:49.965 --> 00:52:51.485 where I think it's actually best.

968 00:52:51.825 --> 00:52:54.845 So we are sort of like designing the system to be like,

969 00:52:54.865 --> 00:52:57.045 as autonomous as possible.

970 00:52:57.985 --> 00:53:00.605 Um, you could, you know, you could sort of like hard code,

971 00:53:00.675 --> 00:53:02.645 like a maximum number.

972 00:53:02.945 --> 00:53:04.565 You could, I don't know, you could like re-rank

973 00:53:04.565 --> 00:53:06.045 them, take a maximum.

974 00:53:06.825 --> 00:53:09.845 Um, I think the simplest I implementation though just lets

975 00:53:09.845 --> 00:53:12.165 the foundation model, um, decide how many,

976 00:53:12.345 --> 00:53:13.765 um, uh, there should be.

977 00:53:14.185 --> 00:53:17.285 But um, yeah, so that's just like a design choice.

978 00:53:17.885 --> 00:53:19.205 I would just recommend letting the

979 00:53:19.205 --> 00:53:23.125 model actually decide that. Um, okay.

980 00:53:23.585 --> 00:53:25.405 You do have a few questions in the chat.

981 00:53:25.905 --> 00:53:29.925 Um, have you benchmarked this against the open AI solution?

982 00:53:30.025 --> 00:53:32.805 And does the framework also work, provide tools

983 00:53:32.945 --> 00:53:36.005 to do web crawling to collect relevant data for the query?

984 00:53:36.595 --> 00:53:39.325 Yeah, so great question. So, um, so I guess like the, the,

985 00:53:39.325 --> 00:53:42.325 the goal, so, um, uh, these different like open source

986 00:53:43.195 --> 00:53:46.245 deep research agents, um, they have, they've had different,

987 00:53:46.245 --> 00:53:47.245 like different goals.

988 00:53:47.745 --> 00:53:50.325 And so, uh, uh, the goal of ours was not so much

989 00:53:50.385 --> 00:53:54.365 to like reproduce the, um, specific like benchmark that, um,

990 00:53:54.585 --> 00:53:58.605 OpenAI, um, uh, ran theirs on,

991 00:53:58.985 --> 00:54:00.045 but to sort of like produce a

992 00:54:00.045 --> 00:54:01.125 system that's like understandable.

993 00:54:01.865 --> 00:54:04.005 We can use it for like, for teaching purposes,

994 00:54:04.465 --> 00:54:06.645 but I recommend, so check out the, um, the,

995 00:54:06.945 --> 00:54:09.565 the deep research agent from hugging face where

996 00:54:09.565 --> 00:54:13.445 that actually was one of their primary motivations was to,

997 00:54:13.985 --> 00:54:15.805 um, to achieve like a similar number

998 00:54:16.025 --> 00:54:18.325 or even exceed the benchmark, uh, uh, which they did.

999 00:54:19.605 --> 00:54:21.165 I think it's also just interesting to like,

1000 00:54:21.165 --> 00:54:24.245 compare different architectures for, for research agents.

1001 00:54:26.175 --> 00:54:28.225 Okay. Does this framework also provide tools

1002 00:54:28.285 --> 00:54:30.545 to do the web crawling to collect relevant data?

1003 00:54:31.325 --> 00:54:34.745 So, um, uh, yes, like, so we've got the, um, uh,

1004 00:54:34.745 --> 00:54:37.865 we've got like sort of the, the tools to, to call a number

1005 00:54:37.865 --> 00:54:40.065 of different web crawling, uh, services.

1006 00:54:40.845 --> 00:54:43.025 Um, I, I think we're still sort of like adding that

1007 00:54:43.045 --> 00:54:45.065 as like a dynamic, uh, tool call.

1008 00:54:45.645 --> 00:54:47.625 Uh, but I think that's something for the near future.

1009 00:54:48.165 --> 00:54:51.625 But um, you can just say, okay, here is like a domain name.

1010 00:54:51.785 --> 00:54:54.105 I want to, I wanna sort of, um, fetch all

1011 00:54:54.105 --> 00:54:57.865 of my data from this domain and then it'll call fire crawl

1012 00:54:57.925 --> 00:54:59.145 or whatever service you're using

1013 00:54:59.565 --> 00:55:03.265 and then pull that in, index that, and then run your query.

1014 00:55:05.805 --> 00:55:08.865 So, great question. And I think that brings us,

1015 00:55:08.925 --> 00:55:11.065 um, just up to about time. Yeah,

1016 00:55:11.295 --> 00:55:12.745 Yeah, right at the top of the hour.

1017 00:55:12.925 --> 00:55:15.905 So thank you guys. Thank you all so much for joining today.

1018 00:55:16.365 --> 00:55:19.425 Uh, Stefan's put his information on the screen here if

1019 00:55:19.425 --> 00:55:20.545 you have questions for him.

1020 00:55:20.925 --> 00:55:22.585 Uh, we also have office hours.

1021 00:55:22.965 --> 00:55:25.345 Um, if you want to, um,

1022 00:55:25.645 --> 00:55:28.625 if you want a specialized one-on-one session, uh,

1023 00:55:28.885 --> 00:55:30.425 the QR code for that is right here.

1024 00:55:30.965 --> 00:55:33.785 Um, and we also have a workshop coming up in

1025 00:55:33.785 --> 00:55:34.865 person in Palo Alto.

1026 00:55:35.125 --> 00:55:37.865 Uh, if you are based in the Bay Area, which I did see some

1027 00:55:37.865 --> 00:55:41.145 of you are so with opening, you register for that.

1028 00:55:41.865 --> 00:55:43.425 Um, so thank you all for joining today

1029 00:55:43.685 --> 00:55:46.505 and, uh, we look forward to seeing you at our next webinar.

1030 00:55:47.055 --> 00:55:48.105 Have a good rest of your day.

1031 00:55:48.365 --> 00:55:49.425 Thanks everyone for coming

1032 00:55:49.485 --> 00:55:52.225 and hope to see you, uh, in March for our,

1033 00:55:52.225 --> 00:55:53.465 for our workshop with OpenAI.

1034 00:55:53.655 --> 00:55:54.305 Okay. Take care.

What Makes "Deep Research"? A Dive into AI Agents

Resources

AI Assistant