Webinar
Fueling AI with Great Data
Join the Webinar
Loading...
In a world where we are surrounded by usable data, we need tools that can bring it all together - whether structured or unstructured. Join Airbyte to learn about the evolving role of data and how tooling can grow with your needs. During this webinar, we’ll go over
- How to mine raw data from diverse data sources.
- How to build your first GenAI proof of concept (POC) for learning and experimentation.
- How to write only the code that matters by leveraging PyAirbyte.
- Demos! 👨💻
- Practical steps to productionalizing your POC.
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Today I'm pleased to introduce today's session, fueling AIwith Great Data and our guest Beaker, AJ Steers. aj Aaron Steers is a staff engineer at Air Byte. AJ has been building open source data softwarefor four more yearsand has contributed to Milano Singer Pipe flying wiseand other open source projects. AJ has previously worked as a data engineerand architect for large companies,like AWS Amazon Video and Starbucks. But his real passion is for open sourceand bringing best practicesto the wider data community when notbuilding software or data systems.
AJ enjoys singing with the Seattle Esotericsand playing with his 8-year-old sonand his dog pip and stall. I mean, just pip Uh, welcome, aj. Thank you. Uh, it's great to be hereand thanks everybody for joining. Um, and thank you for that great introduction.
Uh, so, uh,I'm gonna talk today about fueling, uh,your Gen AI projects with great data. I'm also going to be talking, um, about, uh,getting from prototype to productionand, um, building on a firm, an firm foundationthat will scale with you, um, as you add complexity, um,scale, um, and, uh,and new requirements to your, to your data pipelines. So, yeah, uh, I'll introduce a concept of ELTP. If you've not heard of that, you might be able to guess, uh,what it means if you're familiar with ETL or ELT. But either way, I'll walk you through that.
Um, and we'll talk about, um, the new PI Air Byteand Viss light offerings and how they can work together. Um, so without further ado, um, one final note, uh, isthat if you have questions, feel free to drop them, uh,in the q and a section. Um, and we will check on those periodicallyand we'll also have a section, uh, uh, a place for,for questions at the end. Um, so, uh, so yeah, drop those in the, in the chatand we'll make sure we answer, um,all the questions that we can. Thank you.
All right, so just quick introductions. Uh, Christie already gave me an introduction, so I'll,I'll speed through this a little bit. Um, but myself, uh, I've, I've been a data practitioner, um,all of my professional career. Um, and I love working with data. I love bringing data solutionsto big companies and small companies.
Um, in my work at the big companies and,and in consulting, I realized there was, uh,just a huge gap in what, um, you know, regular,normal sized, normally funded organizationsand even smaller organizations like universities could do,uh, in data without hiring tons of data engineers. So I, I've really gone deep into open sourceand trying to bring our best practicesthat we had at places like Amazon, uh, into, uh,into the common domain so everybody can can leverage that. Um, so, uh, quick introduction to Air Byte. Um, air Byte is, is built on open source,has a huge connector, a huge libraryof open source connectors. And, uh, we've been working hard this past year to make surethat air byte also, uh, scales for, um,for unstructured document sources, gen AI use cases, uh,landing data in, uh, vector stored destinations, um,and, uh, with the releaseof Pyre Byte being able to run anywhere.
So, couple more stats here. Just, just a lot of sources. Um, I'll talk about the distinction of, uh, low co,no-code sources and Python based sources. Um, but just we, you know, we're,we're heavily investing in making sure we have, uh, allof the connectors you need. And we also have, uh, um, tools for youto build your own connectors, which I'll talk more about.
And huge community. We're so, so humbled by our community. Um, just hundreds of, hundreds of contributors, thousandsof prs coming through all the time. Um, alright, so quick introto the different ways you can deploy Air Byte. This will become relevant later.
Um, OSS is just our foundational deploy with Kubernetes. Uh, you own it, it's on your infra. Um, and that's always available. We also have an enterprise solutionthat also is self-managed. So if you want scale and you want it in your own infra, um,but you still want that support, that's there.
And then, um, airbike Cloud as well. Um, what I love about working with, um, Zillows and,and the vis, uh, product is that they follow, um, very,very similar deployment options to really meet,uh, users where they are. Um, and we have recently added PI Air Byte, um, allowing youto run and,and prototype, uh, pipelines super quickly in Python,which I'll show, um,Zillows at the same time has not at the same time,but, uh, also recently, uh, has, has launched Elvis Light,which I'll be talking about as well. And both of these, uh, aim to make sure that your prototypethat you start out with can scaleto a full grown solution without re rebuilding from scratch. So, um, what we've been doing at Air Byte, um,and why I think this approachthat we're gonna talk about right now is important, um, isthat, uh, when you're sourcing data from a varietyof places, um, you don't want to build something that worksfor one team and doesn't work for another team.
Uh, you wanna get data from HubSpot. Um, if, if your machine learning team is, is writing codeby hand and your data engineering team is using some ETLtool, um, you're gonna have fragmentation insilos creating really quickly. So we built p air byte to make sure that allof these use cases could be handledand it could field natural, um, for, uh, for allof these groups who need to build, uh, their own solutions. So a quick, quick intro to bu I'm, uh, I don't work for bubut they, they've, uh, been great partners with usand I want to just tellyou a little bit about their product. Um, so yeah, it's powerful open source, um, high,high performance, highly scalable vector store.
Um, as I mentioned, you also have the run local optionwith Novus light as well as to pointto either the cloud or self-managed. And that's just important 'cause you don't wanna switchtools once you get something that works. And we'll talk about that more. Um, here's a quick code sample. Um, I'm gonna try to make this a theme of this talk.
Um, you know, the question of, uh, of no code or low code. I like to think about just enough code. Uh, and so I think this is a great exampleof just enough code to get the job done. Um, so if you, um, if, if you are uncomfortable,if you are comfortable with codeand you appreciate the benefits of committing your,your projects to gi, you're gonna really appreciatethat you can do this, um, uh, with just a few lines of codeand, and, uh, iterate from there. All right.
So, uh, as promised, let's talk about, um,this like low code, no code. Um, uh, I have this principle that I love, uh,it's from Albert Einstein. Purportedly, uh, everything should be as simpleas possible, but no simpler. I use this constantly as I'm designing, uh,and as I'm, uh, developing products, um, I ask myself,what am I providing herethat is boilerplate that could be removed?Where, where am I adding valueor definition that that contributes to the process?Or where am I providing glue code thatthat really doesn't need to be written by me?Um, another way to ask this question is, am I writing codethat somebody else wrote already?And why am I writing it over again?Why am I reinventing the wheel?Uh, so, uh, I think the answer here is to, to focus on, um,composable pipelines that have simpleand obvious, um, steps in them,and that also break in expected ways, easy to fix ways. Uh, we'll talk a little bit about that, um, in a sec.
So, um, just as you're getting started with Gen AI pipelinesand or data pipelines, wherever you're coming from, um,there's, uh, a little journey, um,that I'm, I wanna take you on. Um, that I has been a very personal partof my life the last, uh,20 years since I've been working in data. And, um, and this is a journey of, of e of ETL to ELT, um,for those uninitiated, um, uh, ETL was,was basically the, uh, catchall termfor getting data from somewhere to somewhere else. And it just kind of happened as extractand load, transform and load. But the problem was with that approach isthat you have certain operations like sorting, joining,whatever that really shouldn't be done in flight.
And if this part fails, then you haven't loaded your data. So switching these extract, loadand transform means you have a very simple extract loadprocess that can be repeated over and over again. It can be created incrementally, um,and then your transform steps, whatever those need to be,happen after you've safely extracted the data. So this is by far a, a strong pattern. I definitely recommend you, uh, examine existing systemsand see if you can take this approach.
Um, and then I'll add one more here, um, which is,especially for, uh, gen AI pipelines,but also for every kind of data pipeline you're building,you want to think about the published step. Uh, there is almost always a published step. It might be simple. It might be telling your users,Hey, come and get the data. Or it might be copyingor renaming data sets from their interim nameto their production name, removing schema.
Um, or it might be pushing data to a vector store. And it's important to think about this processand have, have good terminology for what you're building,um, so that you can have good design underneath. And here's what the design generally ends up looking like. Uh, whatever your raw data is landing to wherever, um,the your vector store is, and then landing in, um, pine Coneor vis, sorry, this slide needed to be updated. Um, but wherever your vector store is, uh, you have extract,load, transform, and publish.
All right, so let's get started. Um, I want to, uh, give you the intro of, um, Pyra Byte,and I'll do it, uh, in some, uh, some code steps here. So, um, so we're getting data from anywhere,and I wanna show you just, this is the exampleof just enough code. We're only really providing code here that is describingwhat we want or how we're gonna get it. Um, air Byte uses this get source function.
PY Air Byte is what I'm showing. Uh, uses Get source, provide the name of a sourceand some configuration and your streams. Um, and you can, um, also, um,print the available connectors if you areunsure what the connector names are. Um, I'll show that in a demo in a second. Um, you can separately set the configurationor change the configuration after you create the source.
Um, really, um, this, uh, and then finally,after you've read the data, um, you can, uh, send itto Pandas, to SQLor to documents we're gonna talk about to documents today. 'cause we're working on gen AI use cases,but you might also be analyzing your data in pandas,or you might wanna query it as a SQL table. All right. And here's the, uh, combined versionwhere it gets all three of those steps, um,getting the source, configuring and selecting your streamsand reading the data all as one step. Um, so again, just the right amount of code.
Okay, so, uh, I want to pause right hereand do a quick demo. So here, uh, I have a Jupyter, uh, sorry, CoLab notebook,which is built on Jupyter. And, um, I have already installed, um, air Byte, um,and just quickly say Air Byte, uh,import Air Byte as Air ab. Um, and then I can, uh, check the available connectors. So I can see available connectors is a long list of sources.
And, um, this is, this is a huge list, um, contributedby our o our fantastic open source community with, uh,a select number of, um, of connectorsthat have really high volume and,and important use cases also certifiedand supported by Air Byte itself. Um, so, uh, the only way we can anybody can scale to this,uh, to this volume is reallyto use the community's, uh, support. So, um, so that's why we have such a, such a large library. I also wanna say by default,the list we're getting here is the Python, um, installableor pip installable, um, uh, list. If I want to get the larger list, um,I can put Docker in here.
It's actually, if you look at the,I think it's install method, is that right?Um, it is,uh, install type. Yeah. But anyway, just putting the word Docker here is fine. Uh, or you can say install type equals docker. Um, and this, if we check the length on this, uh,this is a longer list of connectors, uh,so 289, uh, sources right nowthat you can install just with the same method.
So I, uh, the, the value here is, is pretty large, um,for you to, uh, to have this in your toolbox. Um, there's one more type of connector, um,which is our YAML connectors. And what this means is that, um,this doesn't even require a PIP install. You can just directly run these connectors. Um, if you, let's say you don't have virtual environmentsupport and you don't have Docker available in your runtime,these connectors can be run, um, directly, uh,with power byte and not requiring anyadditional stall installs.
Um, and that is powered by our connector builder, um, which,um, uh, which allows you to use a UI to, um,to develop your own connectors, um, yeah, directly. Um, let me see if I can, uh, getthat up real quick. Um, so I'm going to, um,connect to a local version of, uh, of Air Byte just to demowhat that low-code, um, builder looks like. So here I'm running Air Byte locally on my machine. I'm gonna go to the builder tab.
And here I've designed a Rickand Morty, API, I don't know if we have any Rickand Morty fans, but whatever your API is that you want to,uh, develop, um, you can do so here. Let me hide the sidebar real quick,and I'll show you that all I had to do to develop this API,um, to this extractor is go to my API docs, find the a,the base reference, and then define one or more streams. Um, I went to the character URL path, um,I said the record selector's results. The primary key is id,and just by going through these steps, um, I can, uh,I can basically build my own connector with any API. And, um, and then all I have to doto install this is just copy paste the yaml.
Um, what I love about this personally is that, again,I didn't have to write code by hand,but I still have an artifact I can committo, uh, source control. Um, and this, uh, like I said, this now powers,um, a hundred. And what's the number again?Uh, it's a, a very large number of connectors. Let's get back, back to you. The, the, uh, here.
Um, yeah. So 139 connectors, uh, are built that way,and you can build your own. That's what I want. I want you to know. So next time you think about building extractors fromscratch, um, I hope you'll consider this as an alternative.
Um, you'll have less code to maintain,and you'll benefit from all of these built in capabilities. So here I have a simple get source method. If I run get source, um, I will read the datawith whatever configuration I provide. I'll show GitHub in a second,but for right now, I'm just gonna show this faker data set. Um, and then if I want to get the SQL table behind it,I can, and if I wanna convert it to Pandas,I can just run two pandas.
And as I mentioned, we're also gonna show two documents. So this is just a very quickand fast way for you to get data from anywhereinto your Python environment. Um, or if you don't need Python, you can use cloud, uh,and load it to any data warehouse, uh, that you want. Um, or you can use the OSS offering as I was just showing,uh, by installing locally. Um, yeah, so that is the, the intro, um, demo.
Let's, uh, continue on. Alright, so, um, I do wanna show this. Um, so with, with vis, um, and,and with other, uh, vector short destinations, um,you can use Air Byte Cloud to define, uh, a vector store,um, destination. And, uh, if you use this option, you can just use, uh,our built-in interfaceto define your chunking and your embedding. Um, yeah.
All right. So I just did this demo. I, I decided to do that one second earlier. So we'll move on to the next, uh, topic, which is, uh,converting records to documents. Um, so if you've been working in data and,and you followed us so far, um, you know, okay, we,we found a way to get data from allof the data sets we want, whether it's Salesforceor HubSpot, or, uh, CSV file or another database, or GitHub,or, you know, you name it, internal websites.
Um, uh, I was talking to somebody yesterday about, um,about connecting data from Internet of things, uh,using rest APIs that way. Um, yeah, so you, your imagination is the only limit. Um, so getting that data then into document formatso you can, uh, use it for an LLM, uh, is, uh,is the process really of describingwhat you want the final document to look like. Um, uh, Raby has built in support for that,and I'll show you what that looks like, um, right now. So for this, I'm gonna switch over to my second notebook,um, and I'll just walk you through the steps real quick.
We did the install of, uh, air Byte. We're also installed installing Pi Pi Melva,which we'll show in a second. Um, and,and then we're going to follow some more steps. So here is, uh, actually, let me hide the sidebarand bump up the font a little bit and hide this. Okay.
So, uh, so yeah, so this partof the demo basically calls the same thing we didwith the Faker dataset,but now we're getting real data from GitHub. So we store that in a read result,and I'm running it right now. You can see in real time as it's extracting data. Um, this is running incrementally,which is a fantastic benefit over, um, other methodsthat you might build by hand. Um, all of our source connectors just comewith incremental extraction support,so this will save you tons of time.
If you know anything about getting data from GitHub,it's very high rate limited. You're always gonna run into rate limits,and it's a pretty slow the first time. Um, if for any reason you do want to refresh from scratch,you can do forceful refresh equals true,and then this will ignore your incremental dataand just get a fresh data set for you. Um, so yeah, try to make it super easy for everyone, um,and, uh, and also giving you the, the nice defaults,but override where you need it, um,for the behavior looking for. So here we're able to extract our issues datafrom Source GitHub.
If I wanted to also get pull requests or, uh,or Stargazers, uh, I can just, uh,add those to the streams and it will sync those as well. Um, one more thing while this is running. Um, we made the decision with PY Byteto use Duct DB under the hood. If you're not familiar with Duct db, it is a very,very powerful, very fast local, uh, SQL database, um,that also scales really well and runs great in the cloud. Um, so if you do not specify a different data store,it's just gonna store it locally in a,in a Duct DB database, uh, that's always gonna be fast.
This means you're never gonna run out,you're never gonna crash due to lack of memoryor the data set is too large. Um, okay, so then I just wanna demo lookingat, uh, a record. Let's say I wanna look at, um, uh, let's say,I just wanna look at the first record. So let's say first record, okay?This is, uh, you can see this is Rich Data about,uh, a specific issue. Um, but now let's say I want to render that as a document.
Um, here is a two documents interface that gets it readyfor the lms, and this is gonna help us in the next step. So, so, um,what I'm doing here is just taking the same read result,running two documents on it, telling, um, PRA bytewhat I want to treat as a title, or contents and metadata,and then if I want the metadatato appear in the document, which I do. So here is my transformed document. Here's the title, uh, of a specific issue. Um, here is the, the issue number, the state, and the URL.
These are metadata, and here's the body of the,uh, the issue. Great. So now I've gen I've basically translated,uh, this, where was it?Uh, I translated, um,the, sorry, I'm getting lost a little bit,but I translated that, uh, this, oh, here it is, uh,this first record, this very,very long JSON thing into something that Iand my LLM can use more, predict more, more reliably. Um, so that's why that's important. Okay, so I'll save this last part of the demofor a little bit later, and we'll switch back to,uh, to the slides.
Okay. So now we've seen how to extract data. We've seen how to, uh, to analyzethat data locally in a Python notebook. And, um, and now, uh, now we've just seen howto convert records into documents when we'reusing them for LLMs. Um, so, um, what I wanna demo realactually, yeah, that, I guess this isthe, the next part of the demo.
So let's switch back to, uh, to this last bit. What I wanna show you now is building the Pythonprototypes, uh, that scale. So what I want to cover here is, um, a gen, uh, uh,uh, gen ai, um, pipeline using everything we've builtso far, but then also building the, the gen AI part of itand building in a way that we can, uh,deploy it to production. So here's where I'll bring in Melva Light. Hey, A, I have a question.
Yeah. So like, you're coding everything in Python. Um, yeah. And, and then I noticed that there's a PI Air bytethat's still in Yeah. You know, when that released as gaYeah.
Uh, thanks, thanksfor that question. So why Python first?So, um, Python is quickly becoming the, the defacto languagefor data science,or at least a very, very popular languagefor data science and gen ai. It has always been popular. Um, I think, uh, previously, uh, r was the most popularfor data science and ml, um,and Python has, has pretty much supplanted it. Um, there's also new investments in Python ecosystem, um,such as, um, uh, such as, uh, the, the upcoming,uh, uh, much, much talked about, um, mojo,which is a super set of Python.
Um, and if you look at the documents around Mojo, they,they're also acknowledging that Python justhas a huge ecosystem. Um, what I personally love about using Python, um, is justthat it is innately readable. And when we talk about just enough code that you want, um,I think that's, that's, uh, that's a great, great example. It's generally very conciseand easy to read, which means everyonecan review and contribute. Um, we're not against other languages.
People have asked, will we create a TypeScript version for,for pite or TypeScript Air Byte?Um, and, and we're not there yet. Um, but definitely if, ifthat's something you guys areinterested in, uh, let us know. Um, the other part of the question I think was, um, uh,a prior byte being, well, what?Yeah, yeah. Oh, yeah. What is it gonna be? GaYeah, so we're,we're planning that for end of this quarter.
Um, there's, there, we ha we went through an initial phaseof basically, uh, launching P Byteand having PE a huge number of community members comethrough and test it with every source,including sources they were making themselves. And through that, we found some edge cases,which we've improved on, stability wise,we're getting to a really good place. And so we're that, that's not a concern. What we'd love right now is more feedback from users on,is this the API you love, or would you suggest any changes?And once we feel like we have a stable API, uh,I think we're gonna go into GA four RA byte. Uh, we are expecting that in HH two of this year.
So that might be, uh, Q3 or Q4. Um, but coming before end of this year, for sure. Thank you. Cool. I'll go.
Thanks. Yeah. And any, any other questions as, as of now?Um, you are muted now. I don't see any more questions in our little Yeah,great. But all I know isLl I'll continue on then.
Great. Thanks for those questions. Okay. So, uh, so here we are, uh, now integratingwith Melva Light, um, the, all of the thingsthat I mentioned about, um, all of the thingsthat I mentioned about, uh, Pyre by,and the benefit of being able to run in a notebook. And, um, and then as I've been kind of alluding toand hinting to, um, you're writing in a Python notebook codethat you can also productionalize or an interfaceand a design that you can also productionalize.
Um, when we talked about ELTP, one of the benefitsof having EL as its own thing isthat you can easily move that thing. It's not gonna fall down. It's not gonna, you know, run outof, uh, memory. It's, it's designed to be resilient. Um, and then we're gonna talk about some transformations inPublish operation, which you can, uh, right now we're doing,gonna do it in Notebook, but in the future, as you scale,you could put this in Zillow's Cloud, um,and Airbike Cloud, um, orhowever you decide to end up, uh,deploying your final solution.
All right, so, uh, while this is re rerunning right now,I'll just walk you through the code real quick. Um, so this first step,we're gonna say from Pi mil model dense,we're gonna get an open AI embedding function. And, uh, we're just gonna define that as, uh,the embedding function from Open ai. Um, air Byte has built in secret support,so it's gonna get the secret in this casefrom CoLab Secrets. Um, but it will automatically also get secrets from, uh,your environment variables, um, or your NV file.
Um, we just want this to be really streamlinedand also really secure. So here we go. Um,it looks like we've successfully installed packagesand we've successfully created a list of vectors. So what we did here, oh,I accidentally ran the whole thing again. That's okay.
It's going ahead of us. Uh, it's nice when the code runs quickly. So here, just, uh, we are encoding the documents. So the docs up above, uh, werewhere we said two documents on the dataset. Um, this interface is also fully compatible with Lang Chain.
Uh, we, we partnered with Lang Chain, uh, to make surethat two documents generates documentsthat you can also use in Lang Chain, LAMA Indexand other, um, and other tools you can just hand off thedocuments in this case. Um, we're, uh, we're taking the content of those documents,turning it into a string, getting some vectors out of that,um, by running, uh, the end code documents function. Um, and these vectors, actually,let me just print them real quickso we can see what they look like. So, as you might imagine, there's just a huge list of,of these floats, uh, correspondingto the documents that I encoded. And, uh, now with our vis client, we're gonna create a localvis instance, um, or database.
This is great because I don't needto worry about side effects in myproduction system or my prototype. I don't have any infrastructure to manage. Um, I can just directly run it,including dropping whatever I was doingbefore in my last iteration, uh, creating a new collection. Um, and then for number of dimensions, I can just use, um,the, the, the numberof dimensions in the, in the dimensional model. So running this, um, we'll store the data, uh, in the blob.
I can print out the number of dimensions, which is, uh,1536 in this case. And then lastly, inserting the data. Um, well, not lastly'cause we're gonna benefit from in a second,but here's lastly to get the,the Vector Store destination published. Um, all of what we're doing right here can also be done inair by crowd with no code. Um, but in this case, we're just showcasing howto do do this with, with, uh, with Pine Elvis,um, or VUS light.
I mean, uh, okay, so, um, we've loaded data here. Let me just, uh, show you here the insert statement. We gave the collection nameand we passed the documents that we created earlier. Um, yeah, so now it's time to get some value out of here. So, um, so what we want to do is we wanna take a question,summarize issues related to Spark interop, um,and from that question, uh,we wanna find any related issues and summarize them.
So we do Mil Vista search, uh, give our collection name, um,the data, um, how many entities we want to return,and then what fields we want. Um, so I'm not going to review the printout,but I'll show you, um, uh, what happens when we run this. Um, so you shouldsee, um, that here is the summary. Um, in this case, I'm using OpenAI to make a quick call. And, um, so it tells me there are two open GitHubissues in this repo.
Issues pertain to a feature requestand documentation task, add spark to datasetand summarize prior right personas. So here I've successfully used, I'm, I'm asking,I can ask any question I want now to, um, to my dataand I can prove out does this work as I expect, um,once it is working as I expect we can talk about promotion. But this is basically a fully contained end-to-end datapipeline, extract, load, transform,and publish ELTP all in one notebook. And if you would like to see this notebook or,or, um, let us know and we can share the link with you. Um, yeah, just drop a note in the chat.
Okay. And let's switch back to continue. Alright, so, um, as, as I've been hinting too along, um,it really matters what choices you use when you're buildingyour prototype, um, asto whether you'll be successful when youtry to get into production. Um, I mentioned, I've been in the datafield quite a long time. I've seen firsthand, um, ML projectsand AI projects never get to productionafter months and months of effort.
Um, and I think the, the success rate of deploying, um,AI projects or ML models into production is,is something like less than 20% success. Uh, somewhere, I think the estimates are somewherebetween 80 and 95% failure to get to production. So if you want your, uh, prototype to have a higher chanceof getting to production, um, building on a good foundation,um, is, is really what you wanna do. And so everything I showed was easy to get started. You run into Python or wherever you prefer to run it.
Um, but now you have, uh, those composable stepsthat could easily be handed off to your IT groupor be put into, uh, a production pipeline in the cloud. So, uh, if you, um, if you, I, I use the,the matrix thing here, but like, uh,but yeah, if it, we don't want to have to go backto the beginning and forget and start again. Uh, we don't want to take a chance that like, okay,what we built very carefully in our prototype, um, uh,will be rejected by our IT team when we try to deploy it. Um, we want to build on a foundation that, um, that,that can be deployed. And everything I demoed today is, are thingsthat you can easily just migrate to the cloudor push to the cloud when you're ready to do so.
Um, and that's harder to do if you write your custom code. Um, I have personally seen, um, people develop solutionsand nice drag and drop canvas interfaces, um,and they try to hand that off to the IT teamand they're like, no, I can't support this. Sorry. Uh, and,and this we've designed prior right to have,uh, good design choices. And I think Mils Light is a great tool to usewith a clear path to production and,and, uh, highly scalable, um, when you deploy it.
So, um, benefits of building on ELTP, like I said, um,you don't want to have to migrate or rebuild your tool,but you do want the friendly lo like quickand easy to iterate in, in Pythonor whatever, uh, uh, language you choose. Um, and, um,and with, with the, the approach that I've shown,whatever works, whatever experiments work,you can just promote them and whatever does notwork, you can let it go. Um, and it's not a lot of code to support either way. Um, one thing I didn't talk about is CICD and Git opsand things like that, but if you're a fan of GITand CICD, everything I've shown right now, uh,works really well with, with CICD. All right.
Um, well, I think that is it. Um, let's, uh, pause for any further questionsand, um, um, yeah, before I, then I'll wrap up. Okay. It's a very great group this morning. Um, just checking chat questions.
Uh, we'll, we'll give people a chanceto put any last questions while I do just go through, um,um, a couple more things. So, um, yeah, so I have a quick, oh, yeah. Uh, so you mentioned something like 285, uh, connectors. Yeah. You seeingtypes of connect and yamo.
Um, are there certain, uh, connectorsand types of connectors that you are seeing as very popular?Yeah. Um, so unstructured data sources have been reallypopular, um, uh,because with text is now da, text is now data,or we can now treat text as data. So, um, um, this is not necessarily the, the yamoor low code, but, um, we, we've invested in Google Drive,S3, uh, Azure Storage, allof the places you might store a PDFor a Google Doc or something like that. Um, and now if you wanna say, Hey, who, what was who,what was that document from a month agowhere we were talking about design patterns?You know, you're very likely to be able to find thatand find insights on it, um, through rag. Um, uh, another one that's really popularwith the No-code builder is building for internal endpoints.
Um, almost every company,if they have a decent size infrastructure, uh,we'll start building their own APIs for internal use. Um, and those are things you're never gonna find an outof box connector for, um,because you literally built it yourself,and you also don't want to build from scratch the analytics,um, et cetera. So, um, so we find that the no-code builder and,and the custom connectors that arereally popular sometimes are the ones that only matterto you, but like, the fact that you can build that yourself,uh, is, is also really valuable. So we, I list, I think two with 280 connectors,but we know, um, that people havebuilt thousands of connectors. They're just not all worth publishing to, to external,um, to the community.
Um, but yeah, great question. Um, am I coming through audio?I think, I think I,so I'm gonna say thanks and wrap up. I'm, I might have lost Christie,or I might have been disconnected. Uh, am I still on?All right. Uh, well, I'm going to, uh, to wrap here.
Thank you everybody for joiningand, um, I hope you've enjoyed this topic. If you have any questions for us,uh, feel free to reach out. We are in on Slack. Um, and, um,and yeah, thank you so much, uh, Zillow for hosting. Oh, sorry.
Thank you, Christie, for hosting. I just, uh, said goodbye to everyoneand, uh, I think we lost youfor a second on the, on the video. Yeah. I wasn't sure. I saw the free, the frozen screen,and then I was like, oh, no.
Is that, is that every, is that Yeah. What's going on?Oh, yeah. Uh, so I, I just, I just wrapped upand I said, thanks everybody for joiningand I think we can, uh, we can wrap it if that works. SoSorry about that. Ohno, you're fine, you're fine.
Yeah, I thought it was me. Yeah, well thanks. Yeah,I was like, Oh, sorry. Okay. Well, alright.
Um, yeah, I don't know. It was, but these things have a longer tail thanwhat the live audience is. Mm-Hmm. And so they end up going out, out on YouTube mm-hmm. And getting a lot more views in the long tail.
Yeah. Alright. Yeah. Um, well, so thankYou for doing this. Thanks.
Yeah. No, I appreciate it. And then, oh yeah. And then in your slides, can you changethat, um, pine Cone?Oh, logo?Yes. Sorry.
Yeah. Replacement would be Zillow's. Yeah. Um, because Zillow's managed open source Novis. Okay, great.
Uh, yeah, so, and then,and then we'll share out,I'm sorry for my, uh, internet glitch. Okay. Yeah, I, I lost a little bit of what you're saying,but we can catch up in Slack. Thank you so much. Okay, thanks.
All right. Bye.
Meet the Speaker
Join the session for live Q&A with the speaker
AJ Steers
Staff Software Engineer, AI Technologies
AJ (Aaron) Steers is a Staff Engineer at Airbyte. AJ has been building open source data software for 4+ years and has contributed to Meltano, Singer, Pipelinewise, and other open source projects. AJ has previously worked as a data engineer and architect for large companies like AWS, Amazon Video, and Starbucks - but his real passion is for open source and bringing best practices to the wider data community. When not building software or data systems, AJ enjoys singing with the Seattle Esoterics and playing with his 8 yo son and his dog Pip.