-
97% Cheaper, Faster, Better, Correct AI — with Varun Mohan of Codeium
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2023-03-02 14:38
OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December; we’re now passing through those savings to API users.We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. Timestamps* 00:00: Intro to Varun and Exafunction* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing* 05:30: Should companies own their ML infrastructure?* 07:00: The two kinds of LLM Applications* 08:30: Codeium* 14:50: “Our growth is 4-5% day over day”* 16:30: Latency, Quality, and Correctability* 20:30: Acceleration mode vs Exploration mode* 22:00: Copilot for X - Harvey AI’s deal with Allen & Overy* 25:00: Scaling Laws (Chinchilla)* 28:45: “The compute-optimal model might not be easy to serve”* 30:00: Smaller models* 32:30: Deepmind Retro can retrieve external infromation* 34:30: Implications for embedding databases* 37:10: LLMOps - Eval, Data Cleaning* 39:45: Testing/User feedback* 41:00: “Users Is All You Need”* 42:45: General Intelligence + Domain Specific Dataset* 43:15: The God Nvidia computer* 46:00: Lightning roundShow notes* Varun Mohan Linkedin* Exafunction* Blogpost: Are GPUs Worth it for ML* Codeium* Copilot statistics* Eleuther’s The Pile and The Stack* What Building “Copilot for X” Really Takes* Copilot for X* Harvey, Copilot for Law - deal with Allen & Overy* Scaling Laws* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)* chinchilla's wild implications (LessWrong)* UL2 20B: An Open Source Unified Language Learner (20B)* Paper - Deepmind Retro* “Does it make your beer taste better”* HumanEval benchmark/dataset* Reverse Engineering Copilot internals* Quora Poe* Prasanna Sankar notes on FLOPs and Bandwidth* NVIDIA H100 specs - 3TB/s GPU memory, 900GB/s NVLink Interconnect* Optimizer state is 14x size of model - 175B params => 2.5TB to store state → needs at least 30 H100 machines with 80GB each* Connor Leahy on The Gradient PodcastLightning Rounds* Favorite AI Product: Midjourney* Favorite AI Community: Eleuther and GPT-J* One year prediction: Better models, more creative usecases* Request for Startup: Superathlete Fitness Assistant* Takeaway: Continue to tinker!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx, writer, editor of L Space Diaries.[00:00:20] swyx: Hey, and today we have Varun Mohan from Codeium / Exafunction on. I should introduce you a little bit because I like to get the LinkedIn background out of the way.[00:00:30] So you did CS at MIT and then you spent a few years at Nuro where you were ultimately tech lead manager for autonomy. And that's an interesting dive. Self-driving cars in AI and then you went straight into Exafunction with a few of your coworkers and that's where I met some of them and started knowing about Exafunction.[00:00:51] And then from out of nowhere you cloned GitHub Copilot. That's a lot of progress in a very short amount of time. So anyway, welcome .[00:00:59] Varun Mohan: That's high praise.[00:01:00] swyx: What's one thing about you that doesn't appear on LinkedIn that is a big part of what people should know?[00:01:05] Varun Mohan: I actually really like endurance sports actually.[00:01:09] Like I, I've done multiple triathlons. I've actually biked from San Francisco to LA. I like things that are like suffering. I like to suffer while I, while I do sports. Yeah.[00:01:19] swyx: Do you think a lot about like code and tech while you're doing those endurance sports or are you just,[00:01:24] Varun Mohan: your mind is just focused?[00:01:26] I think it's maybe a little bit of both. One of the nice things about, I guess, endurance athletics, It's one of the few things you can do where you're not thinking about, you can't really think about much beyond suffering. Like you're climbing up a hill on a bike and you see like, uh, you see how many more feet you need to climb, and at that point you're just struggling.[00:01:45] That's your only job. Mm-hmm. . Yeah. The only thing you can think of is, uh, pedaling one more pedal. So it's actually like a nice, a nice way to not think about work. Yeah,[00:01:53] Alessio Fanelli: yeah, yeah. Maybe for the audience, you wanna tell a bit about exa function, how that came to be and how coding came out[00:01:59] Varun Mohan: of that. So a little bit about exo function.[00:02:02] Before working at exa function, I worked at Neuro as Sean was just saying, and at neuro, I sort of managed large scale offline deep learning infrastructure. Realized that deep learning infrastructure is really hard to build and really hard to maintain for even the most sophisticated companies, and started exa function to basically solve that gap, to make it so that it was much easier for companies.[00:02:24] To serve deep learning workloads at scale. One of the key issues that we noticed is GPUs are extremely hard to manage fundamentally because they work differently than CPUs. And once a company has heterogeneous hardware requirements, it's hard to make sure that you get the most outta the hardware. It's hard to make sure you can get, get great GPU utilization and exa function was specifically built to make it so that you could get the most outta the hardware.[00:02:50] Make sure. Your GP was effectively virtualized and decoupled from your workload to make it so that you could be confident that you were running at whatever scale you wanted without burning the bank.[00:03:00] swyx: Yeah. You gave me this metric about inefficiency,[00:03:03] Varun Mohan: right? Oh, okay. Like flop efficiency. Yeah. Yeah. So basically, I think it comes down to, for most people, one of the things about CPUs that's really nice is with containers, right?[00:03:13] You can end up having a single. You can place many containers on them and all the containers will slowly start eating the compute. It's not really the same with GPUs. Like let's say you have a single. For the most part, only have one container using that gpu. And because of that, people heavily underestimate what a single container can sort of do.[00:03:33] And the GPU is left like heavily idle. And I guess the common term now with a lot of LM workloads is like the flop efficiency of these workloads. M F U, yeah. Yeah. Model flop utilization. The model flop utilization, which is basically like what fraction of the flops or compute on the hardware is actually getting used.[00:03:49] And sort of what we did at exa function. Not only make it so that the model was always running, we also built compiler technology to make it so that the model was also running more efficiently. And some of these things are with tricks like operator fusion, like basically you could imagine fusing two operations together such that the time it takes to compute.[00:04:07] the fused operation is lower than the time it takes for each individual operation. Oh my God. Yeah. .[00:04:13] Alessio Fanelli: Yeah. And you have this technique called dynamic multiplexing, which is basically, instead of having a one-to-one relationship, you have one GP for multiple clients. And I saw one of your customers, they went from three clients to just one single GPU and the cost by 97%.[00:04:29] What were some of those learning, seeing hardware usage and efficiencies and how that then played into what, what[00:04:34] Varun Mohan: you're building? Yeah, I think it basically showed that there was probably a gap with even very sophisticated teams. Making good use of the hardware is just not an easy problem. I think that was the main I, it's not that these teams were like not good at what they were doing, it's just that they were trying to solve a completely separate problem.[00:04:50] They had a model that was trained in-house and their goal was to just run it and it, that should be an easy. Easy thing to do, but surprisingly still, it's not that easy. And that problem compounds in complexity with the fact that there are more accelerators now in the cloud. There's like TPUs, inferential and there's a lot of decisions, uh, that users need to make even in terms of GPU types.[00:05:10] And I guess sort of what we had was we had internal expertise on what the right way to run the workload was, and we were basically able to build infrastructure and make it so that companies could do that without thinking. So most[00:05:21] Alessio Fanelli: teams. Under utilizing their hardware, how should they think about what to own?[00:05:26] You know, like should they own the appearance architecture? Like should they use Xlo to get it to production? How do you think[00:05:32] Varun Mohan: about it? So I think one thing that has proven to be true over the last year and a half is companies, for the most part, should not be trying to figure out what the optimal ML architecture is or training architecture is.[00:05:45] Especially with a lot of these large language models. We have generic models and transformer architecture that are solving a lot of distinct problems. I'll caveat that with most companies. Some of our customers, which are autonomous vehicle companies, have extremely strict requirements like they need to be able to run a model at very low latency, extremely high precision recall.[00:06:05] You know, GBT three is great, but the Precision Recall, you wouldn't trust someone's life with that, right? So because of that, they need to innovate new kinds of model architectures. For a vast majority of enterprises, they should probably be using something off the shelf, fine tuning Bert models. If it's vision, they should be fine tuning, resonant or using something like clip like the less work they can do, the better.[00:06:25] And I guess that was a key turning point for us, which is like we start to build more and more infrastructure for the architectures that. The most popular and the most popular architecture was the transformer architecture. We had a lot of L L M companies explicitly reach out to us and ask us, wow, our GT three bill is high.[00:06:44] Is there a way to serve G P T three or some open source model much more cheaply? And that's sort of what we viewed as why we were maybe prepared for when we internally needed to deploy transform models our.[00:06:58] Alessio Fanelli: And so the next step was, Hey, we have this amazing infrastructure. We can build kind of consumer facing products, so to speak, at with much better unit economics, much better performance.[00:07:08] And that's how code kind[00:07:10] Varun Mohan: of came to be. Yeah. I think maybe the, the play is not maybe for us to be just, we make a lot of consumer products. We want to make products with like clear ROI in the long term in the enterprise. Like we view code as maybe one of those things. Uh, and maybe we can, we can talk about code maybe after this.[00:07:27] We. Products like co-pilot as being extremely valuable and something that is generating a lot of value to professionals. We saw that there was a gap there where a lot of people probably weren't developing high intensive L L M applications because of cost, because of the inability to train models the way they want to.[00:07:44] And we thought we could do that with our own infrastructure really quickly.[00:07:48] swyx: I wanna highlight when you say high intensive, you mean basically generate models every key, uh, generate inferences on every keystroke? That's[00:07:55] Varun Mohan: right. Yeah. So I would say like, there's probably two kinds of L l M applications here.[00:07:59] There's an L L M application where, you know, it rips through a bunch of data and maybe you wait a couple minutes and then you see something, and then there's an application where the quality is not exactly what you want, but it's able to generate enough, sorry, low enough latency. It's still providing a ton of value.[00:08:16] And I will say there's like a gap there where the number of products that have hit that co-pilot spot is actually not that high. Mm. A lot of them are, are kind of like weight and, you know, just generate a lot of stuff and see what happens because one is clearly more compute intensive than the other Basically.[00:08:31] swyx: Well co uh, I don't know if we told the whole story yet, you were going to[00:08:35] Varun Mohan: dive into it. . Yeah, so I guess, I guess the story was I guess four or five months ago we sort of decided internally as a team we were like very early adopters of co-pilot. I'm not gonna sit here and say co-pilot, it's not a great tool.[00:08:45] We love co-pilot. It's like a fantastic tool. We all got on the beta. The moment it came out we're like a fairly small T, but we, like we all got in, we were showing each other completions. We end up writing like a lot of cuda and c plus plus inside the company. And I think there was probably a thought process within us that was like, Hey, the code we write is like very high aq.[00:09:04] You know? So like there's no way it can help. And one of the things in c plus plus that's like the most annoying is writing templates. Writing template programming is maybe one of those things. No one, maybe there's like some people in the C plus O standards community that can do it without looking at the, looking at anything online.[00:09:19] But we struggle. We struggle writing bariatric templates and COPA just like ripped through. Like we had a 500 line file and it was just like writing templates like, and we didn't really even test it while we were running it. We then just compiled it and it just, We're like, wow. Like this is actually something that's not just like it's completing four loops, it's completing code for us.[00:09:38] That is like hard in our brains to reach, but fundamentally and logically is not that complicated. The only reason why it's complicated is there's just a lot of rules, right. And from then we were just like, wow, this is, that was maybe the first l l m application for us internally, because we're not like marketers that would use, uh, Jasper, where we were like, wow, this is like extremely valuable.[00:09:58] This is not a toy anymore. So we wanted to take our technology to build maybe apps where these apps were not gonna be toys, right? They were not gonna be like a demo where you post it on Twitter and then you know there's hype and then maybe like a month later, no one's using.[00:10:11] swyx: There's a report this morning, um, from co-pilot where they, they were estimating the key tabs on amount of code generated by a co-pilot that is then left in code repos and checked in, and it's something like 60 to 70%[00:10:24] Varun Mohan: That's, that's nuts, but I totally believe it given, given the stats we have too. There's this flips in your head once you start using products like this, where in the beginning there's like, there's like skepticism, like how, how valuable can it be? And suddenly now like user behavior fundamentally changes so that now when I need to write a function, I'm like documenting my code more because I think it's prompting the model better, right?[00:10:43] So there's like this crazy thing where it's a self-fulfilling prophecy where when you get more value from it, more of your code is generated. From co-pilot[00:10:50] swyx: just to walk through the creation process, I actually assumed that you would have grabbed your data from the pile, which is the Luther ai, uh, open source, uh, code information.[00:11:00] But apparently you scraped your own[00:11:01] Varun Mohan: stuff. Yeah. We ended up basically using a lot of open, I guess, permissively licensed code, uh, in the public internet, mainly because I think also the pile is, is fairly a small subset. Uh, I think maybe after we started there was the, that was also came to be, but for us, we had a model for ourselves even before that, uh, was the point.[00:11:21] Ah, okay. So the timing was just a little bit off. Yeah, exactly. Exactly. But it's awesome work. It's, it seems like there's a good amount of work that's getting done Decentrally. Yeah. Which is a little bit surprising to me because I'm like more bullish on everyone needs to get together in a room and make stuff happen.[00:11:35] Like we're all in person in Mountain View. But yeah, no, it's pretty impressive. Yeah. Luther in general, like everything they've done, I'm pretty impressed with it. Yeah, and we're[00:11:42] swyx: gonna talk about that. Cause I, I didn't know you were that involved in the community[00:11:45] Varun Mohan: that early on I wasn't involved. It was more of like a, I was watching and maybe commenting from time to time.[00:11:50] So they're a very special community for sure. Yeah,[00:11:52] swyx: yeah, yeah. That's true. That's true. My impression is a bunch of you are geniuses. You sit down together in a room and you. , get all your data, you train your model, like everything's very smooth sailing. Um, what's wrong with that[00:12:02] Varun Mohan: image? Yeah, so probably a lot of it just in that a lot of our serving infrastructure was already in place, Uhhuh before then.[00:12:09] So like, hey, we were able to knock off one of these boxes that I think a lot of other people maybe struggle with. The open source serving offerings are just, I will say, not great in that. That they aren't customized to transformers and these kind of workloads where I have high latency and I wanna like batch requests, and I wanna batch requests while keeping latency low.[00:12:29] Mm-hmm. , right? One of the weird things about generation models is they're like auto regressive, at least for the time being. They're auto aggressive. So the latency for a generation is a function of the amount of tokens that you actually end up generating. Like that's like the math. And you could imagine while you're generating the tokens though, unless you batch a.[00:12:46] It's gonna end up being the case that you're not gonna get great flop utilization on the hardware. So there's like a bunch of trade offs here where if you end up using something completely off the shelf, like one of these serving thing, uh, serving frameworks, you're gonna end up leaving a lot of performance on the table.[00:13:00] But for us, we were already kind of prepared. To sort of do that because of our infrastructure that we had already built up. And probably the other thing to sort of note is early on we were able to leverage open source models, sort of bootstrap it internally within our company, but then to ship, we finally had some requirements like, Hey, we want this model to have fill in the middle capabilities and a bunch of other things.[00:13:20] And we were able to ship a model ourselves. So we were able to time it so that over the course of multiple months, different pieces were like working out properly for us. So it wasn't. . You know, we started out and we were just planning the launch materials. The moment we started there was like maybe some stuff that was already there, some stuff that we had already figured out how to train models at scale internally.[00:13:38] So we were able to just leverage that muscle very quickly. I think the one[00:13:41] swyx: thing that you had figured out from the beginning was that it was gonna be free forever. Yeah. Yeah, co-pilot costs $10[00:13:47] Varun Mohan: a month. Co-pilot costs $10 a month. I would argue significantly more value than $10 a month. The important thing for us though, was we are gonna continue to build more great products on top of code completion.[00:13:58] We think code completion is maybe day one of what the future looks like. And for that, clearly we can't be a product that's like we're $10 a month and we're adding more products. We want a user base that loves using us. And we'll continue to stay with us as we continue to layer on more products. And I'm sure we're gonna get more users from the other products that we have, but we needed some sort of a differentiator.[00:14:17] And along the way we realized, hey, we're pretty efficient at running these workloads. We could probably do this. Oh, so it wasn't,[00:14:23] swyx: it was a plan to be free from the start. You just[00:14:25] Varun Mohan: realized we, yeah. We realized we could probably, if we cut and optimized heavily, we could probably do this properly. Part of the reasoning here was we were confident we could probably build a pro tier and go to the enter.[00:14:35] But for now, originally when we, when we started, we weren't like, we're just gonna go and give every, all pieces of software away for free. That wasn't like sort of the goal there. And[00:14:43] swyx: since you mentioned, uh, adoption and, you know, traction and all that, uh, what can you disclose about user growth? Yeah, user adoption.[00:14:50] Varun Mohan: Yeah. So right now we have. We probably have over 10,000 users and thousands of daily actives, and people come back day over day. Our growth is like around, you know, four to 5% day over day right now. So all of our growth right now is sort of like word of mouth, and that's fundamentally because like the product is actually one of those products where.[00:15:08] Even use COT and use us, it's, it's hard to tell the difference actually. And a lot of our users have actually churned off of cot isn't Yeah. I,[00:15:14] swyx: I swept Yeah. Yeah. To support you guys, but also also to try[00:15:17] Varun Mohan: it out. Yeah, exactly. So the, the crazy thing is it wasn't like, Hey, we're gonna figure out a marketing motion of like, Going to the people that have never heard of co-pilot and we're gonna like get a bunch of users.[00:15:27] We wanted to just get users so that in our own right we're like a really great product. Uh, and sort of we've spent a lot of engineering time and obviously we co-wrote a blog post with you, Sean, on this in terms of like, there's a lot of engineering work, even beyond the latency, making sure that you can get your cost down to make a product like this actually work.[00:15:44] swyx: Yeah. That's a long tail of, of stuff that you referenced,[00:15:47] Varun Mohan: right? Yes. Yeah, exactly.[00:15:48] swyx: And you, you said something to the order of, um, and this maybe gets into co-pilot for X uh, which is something that everybody is keen about cuz they, they see the success of co-pilot. They're like, okay, well first of all, developer tools, there's more to do here.[00:16:00] And second of all, let's say the co-pilot idea and apply for other disciplines. I don't know if you wanna Yeah.[00:16:06] Varun Mohan: There's[00:16:06] Alessio Fanelli: gonna some. Key points that, that you touched on. Um, how to estimate, inference a scale, you know, and the latency versus quality trade-offs. Building on first party. So this is free forever because you run your own models, right?[00:16:19] That's right. If you were building on open ai, you wouldn't be able to offer it for free real-time. You know, when I first use coding, It was literally the same speed as Copi is a little bit[00:16:29] swyx: faster. I don't know how to quantify it,[00:16:31] Varun Mohan: but we are faster. But it's one of those things that we're not gonna like market as that's the reason because it's not in and of itself a right for you to like, I'm just gonna be open with you.[00:16:39] It's not a reason for you to like suddenly turn off a copilot where if our answers were trash, uh, but we were faster. You know what I mean? But your focus[00:16:46] Alessio Fanelli: was there. We used the alpha, I think prem on our discord came to us and say, you guys should try this out. So it was really fast. Even then, prompt optimization is another big thing, and model outputs and UX kind of how you bring them together.[00:17:00] Which ones of these things are maybe like the one or two that new founders should really think about first?[00:17:07] Varun Mohan: Yeah, I think, I think my feeling on this is unless you are ex, you probably should always bootstrap on top of an existing a. Because like even if you were to, the only reason why we didn't is because we knew that this product was actually buildable.[00:17:22] Probably if we worked hard enough to train a model, we would actually be able to build a great product already. But if you're actually going out and trying to build something from scratch, unless you genuinely believe, I need to fine tune on top of, you know, terabytes of data terabyte is a very large amount of data, but like tens of gigabytes of data.[00:17:37] Probably go out and build on top of an API and spend most of your time to make it so that you can hit that quality latency trade off properly. And if I were to go out and think about like the three categories of like an LM product, it's probably like latency, quality, and correct ability. The reality is, you know, if I were to take a product like co-pilot or Coum, the latency is very low.[00:17:58] The quality I think, is good enough for the task, but the correct ability is, is very easy. Credibility. What, what is correct ability? Correct ability means, let's say the quality is not there. Like you consider the the case where, The answer is wrong. How easy is it for your user to actually go and leverage parts of the generation?[00:18:16] Maybe a, a concrete example. There's a lot of things people are excited about right now where I write a comment and it generates a PR for me, and that's like, that's like really awesome in theory. I think that's like a really cool thing and I'm sure at some point we will be able to get there. That will probably require an entirely new model for what it's worth that's trained on diffs and commits and all these other things that looks at like improvements and code and stuff.[00:18:37] It's probably not gonna be just trained on generic code. But the problem with those, those sort of, I would say, applications are that, let's suppose something does change many files, makes large amounts of changes. First of all, it's guaranteed not gonna be. Because even the idea of like reviewing the change takes a long time.[00:18:54] So if the quality and the correct ability is just not there, let's say you had 10 file, a 10 file change and you modified like, you know, file two and four, and those two modifications were consistent, but the other eight files were not consistent. Then suddenly the correct ability is like really hard.[00:19:10] It's hard to correct the output of the model. And so the user interface is 100% really important. But maybe until you get the latency down or the correct ability, like correct ability, like a lot better, it's probably not gonna be shippable. And I think that's what you gotta spend your time focusing on.[00:19:26] Can you deliver a product that is actually something users want to use? And I think this is why I was talking about like demo. It's like very easy to hand to handpick something that like works, that works for a demo, exceedingly hard for something that has large scope, like a PR to work consistently. It will take a lot of engineering effort to make it work on small enough chunks so that a user is like, wow, this is value generative to me.[00:19:49] Because eroding user trust or consumer trust is very easy. Like that is, it is is much, much, it's very easy to erode user trust versus enterprise. So just be mindful of that, and I think that's probably like the mantra that most of these companies need to operate under. Have you done any[00:20:05] Alessio Fanelli: analysis on. What the ratio between code generated and latency is.[00:20:11] So you can generate one line, but you could also generate the whole block. You can generate Yeah. A whole class and Yeah. You know, the more you generate the, the more time it takes. Like what's the sweet spot that, that you[00:20:21] Varun Mohan: found? Yeah, so I think there was a great study and, and I'm not sure if it's possible to link it, but there was a great study about co-pilot actually that came out.[00:20:28] Basically what they said was there were two ways that developers usually develop with a code assistant technology. They're either in what's called like acceleration mode or exploration mode. And exploration mode is basically you're in the case where you don't even know what the solution space for the function is.[00:20:43] and you just wanna generate a lot of code because you don't even know what that looks like. Like it might use some API that you've never heard of. And what you're actually doing at that point is like you're writing a clean comment, just wishing and praying that you know, the generation is long enough and gets you, gets you far enough, right?[00:20:57] acceleration mode is basically you are doing things where you are very confident in what you're doing and effectively. Code gives you that muscle so that you can basically stay in flow state and you're not thinking about like exactly what the APIs look like, but push comes to shove. You will figure out what the APIs look like, but actually like mentally, it takes off like a load in your head where you're like, oh wow.[00:21:18] Like I can just do this. The intent to execution is just a lot, a lot lower there. And I think effectively you want a tool that captures that a little bit. And we have heuristics in terms of captur. Whether or not you're in acceleration versus exploration mode. And a good heuristic is, let's say you're inside like a basic block of a piece of code.[00:21:37] Let's say you're inside a a block of code or an IF statement, you're probably already in acceleration mode and you would feel really bad if I started generating the ELs clause. Because what happens if that else causes really wrong? That's gonna cause like mental load for you because you are the way programmers think.[00:21:51] They only want to complete the if statement first, if that makes sense. So there are things where we are mindful of like how many lines we generate if you use the product, like multi-line generations happen and we are happy to do them, but we don't want to do them when we think it's gonna increase load on developers, if that makes sense.[00:22:07] That[00:22:07] Alessio Fanelli: makes sense. So co-pilot for x. , what are access that you think are interesting for people to build[00:22:13] Varun Mohan: in? Didn't we see some, some tweet recently about Harvey ai, uh, company that, that is trying to sell legal? It's like a legal, legal assistance. That's, that's pretty impressive, honestly. That's very impressive.[00:22:23] So it seems like I would really love to see what the product looks like there, because there's a lot of text there. You know, looking at bing, bing, ai, like, I mean, it's, it's pretty cool. But it seems like groundedness is something a lot of these products struggle with, and I assume legal, if there's one thing you want them to.[00:22:39] To get right. It's like the groundedness. Yeah.[00:22:42] swyx: Yeah. I've made the analogy before that law and legal language is basically just another form of programming language. You have to be that precise. Yes. Definitions must be made, and you can scroll to find the definition. It's the same thing. Yes. ,[00:22:55] Varun Mohan: yes. Yeah. But like, I guess there's a question of like comprehensiveness.[00:22:59] So like, let's say, let's say the only way it generates a suggestion is it provides like, you know, citations to other legal. You don't want it to be the case that it misses things, so you somehow need the comprehensiveness, but also at the same time, you also don't want it to make conclusions that are not from the site, the things at sites.[00:23:15] So, I don't know, like that's, that's very impressive. It's clear that they've demonstrated some amount of value because they've been able to close a fairly sizable enterprise contract. It was like a firm with 3,500 lawyers, something nuts, honestly. Very cool. So it's clear this is gonna happen, uh, and I think people are gonna need to be clever about how they actually make it work.[00:23:34] Within the constraints of whatever workload they're operating in. Also, you, you guys[00:23:37] swyx: are so good at trading stuff, why don't you, you try[00:23:39] Varun Mohan: cloning it. Yeah. So I think, I think that's, that's, uh, preview the roadmap. Yeah, yeah, yeah, yeah. No, no, no, but I'm just kidding. I think one of the things that we genuinely believe as a startup is most startups can't really even do one thing properly.[00:23:52] Mm-hmm. Focus. Yeah. Yeah. Usually doing one thing is really hard. Most companies that go public have like maybe a couple big products. They don't really have like 10, so we're under no illusions. Give the best product experience, the amount of engineering and attention to detail, to build one good product as hard.[00:24:08] So it's probably gonna be a while before we even consider leaving code. Like that's gonna be a big step because the amount of learning we need to do is gonna be high. We need to get users right. We've learned so much from our users already, so, yeah, I don't think we'd go into law anytime soon.[00:24:22] swyx: 3,500 lawyers with Ellen and Ry, uh, is, is is apparently the, the new[00:24:27] Varun Mohan: That's actually really big.[00:24:28] Yeah. Yeah. I can congrat.[00:24:29] swyx: Yeah, it's funny cuz like, it seems like these guys are moving faster than co-pilot. You know, co-pilot just launched, just announced enterprise, uh, like co-pilot for teams or co-pilot for Enterprise. Yeah. After like two years of testing.[00:24:40] Varun Mohan: Yeah, it does seem like the co-pilot team has built a very, very good product.[00:24:44] Um, so I don't wanna like say anything, but I think it is the case to startups will be able to move faster. I feel like that is true, but hey, like GitHub has great distribution. Whatever product they do have, they will be able to sell it really. Shall[00:24:56] swyx: we go into model numbers and infra estimates? our favorite[00:25:01] Varun Mohan: topics.[00:25:02] Nice small models. Nice.[00:25:04] swyx: So this is, um, relevant to basically I'm researching a lot of skilling law stuff. You have a lot of thoughts. You, you host paper discussions[00:25:12] Varun Mohan: in your team. Yeah, we, we try to like read papers that we think are really interesting and relevant to us. Recently that's been, there's just a fire hose of papers.[00:25:21] You know, someone even just curating what papers we should read internally as a company. Yeah, I think, I think there's, there's so much good content[00:25:28] swyx: out there. You should, you guys should have a podcast. I mean, I told you this before. Should have a podcast. Just, just put a mic near where, where you guys are[00:25:33] Varun Mohan: talking.[00:25:34] We gotta, we gotta keep developing coding though, . No, but you're doing this discussion[00:25:38] swyx: anyway. You[00:25:38] Varun Mohan: might as well just, oh, put the discussion on a podcast. I feel like some of the, some of the thoughts are raw, right? Like, they're not gonna be as, as nuanced. Like we'll just say something completely stupid during our discussions.[00:25:48] I don't know, , maybe that's exciting. Maybe that's, it's kinda like a justin.tv, but for ML papers, Okay, cool. I watched that.[00:25:55] swyx: Okay, so co-pilot is 12 billion parameters. Salesforce cogen is up to 16. G P t three is 175. GP four is gonna be 100 trillion billion. Yeah. So what, what we landed on with you is with, uh, with Cilla, is that we now have an idea of what compute optimal data scaling is.[00:26:14] Yeah. Which is about 20 times parameters. Is that intuitive to you? Like what, what did that[00:26:18] Varun Mohan: unlock? I think basically what this shows is that bigger models are like more data efficient, like given the same number of tokens, a big model like trained on the same number of tokens. A bigger model is like, is gonna learn more basically.[00:26:32] But also at the same time, the way you have to look at it is there are more flops to train a bigger model on the same number of tokens. So like let's say I had a 10 billion parameter model and I trained it on on 1 million tokens, but then I had a 20 billion parameter model at the end of it will be a better.[00:26:47] It will have better perplexity numbers, which means like the probability of like a prediction is gonna be better for like the next token is gonna be better. But at the end of it, you did burn twice the amount of compute on it. Right? So Shinto is an interesting observation, which says if you have a fixed compute budget, And you want the best model that came out of it because there's like a difference here where a model that is, that is smaller, trained on the same number of tokens as fewer flops.[00:27:12] There's a a sweet spot of like number of tokens and size a model. I will say like people probably like. Are talking about it more than they should, and, and I'll, I'll explain why, but it's a useful result, which is like, let's say I have, you know, some compute budget and I want the best model. It tells you what that, what you should generate.[00:27:31] The problem I think here is there is a real trade off of like, you do need to run this model somewhere. You need to run it on a piece of hardware. So then it comes down to how much memory does that piece of hardware have. Let's say for a fixed compute budget, you could train a 70 billion parameter. What are you gonna put that on?[00:27:47] Yeah, maybe you could, could you put that on an 80 gig, A 100? It would be a stretch. You could do things like f, you know, in eight F p a, to reduce the amount of memory that's on the box and do all these other things. But you have to think about that first, right? When you want to go out and train that model.[00:27:59] The worst case is you ended up training that mo, that model, and you cannot serve it. So actually what you end up finding is for a lot of these code completion models, they are actually what you would consider over-trained . So by that I mean like, let's look at a model like Cogen. It's actually trained on, I believe, and, and I could be wrong by, you know, a hundred billion here or there.[00:28:18] I got some data. Oh, okay. Let's look at the 3 billion parameter model. It's a 2.7. I think it's actually a 2.7 billion barometer model. It's weird because they also trained on natural language on top of code, but it's trained on hundreds of billions of tokens. If you applied that chinchilla, Optimization to it, you'd be like, wow, this is, this is a stupid use of compute.[00:28:36] Right? Because three, they should be going to 60, any anything more than 60. And they're like, they should have just increased the model size. But the reality is if they had like the compute optimal one might not be one that's easy to serve, right? It could just have more parameters. And for our case, our models that we train internally, they might not be the most compute.[00:28:56] In other words, we probably could have had a better model by making it larger, but the trade off would've been latency. We know what the impact of having higher latency is, and on top of that, being able to fit properly on our hardware constraints would've also been a concern.[00:29:08] swyx: Isn't the classic stopping point when you, you see like loss kind of levels off.[00:29:12] Right now you're just letting chinchilla tell you,[00:29:16] Varun Mohan: but like you should just look at loss. The problem is the loss will like continue to go down. It'll just continue to go down like, like in a, in a way that's like not that pleasing. It's gonna take longer and longer. It's gonna be painful, but it's like one of those things where if you look at the perplexity number of difference between.[00:29:31] Let's say a model that's like 70 billion versus 10 billion. It's not massive. It's not like tens of percentage points. It's like very small, right? Mm. The reality is here, like, I mean this comes down to like IQ of like these models in some sense, like small wins at the margins are massive wins in terms of iq.[00:29:47] Like it's harder to get those and they don't look as big, but they're like massive wins in terms of reasoning. They can now do chain of thought, all these other things. Yeah, yeah, yeah.[00:29:55] swyx: It's, and, and so apparently unlocked around the[00:29:57] Varun Mohan: 20 billion. Yes. That's right. Some kind of magic. Yeah. I think that was from the UL two or maybe one of those land papers.[00:30:03] Any thoughts on why? Like is there is? I don't know. I mean, emergence of intelligence, I think. I think maybe one of the things is like we don't even know, maybe like five years from now of what we're gonna be running are transformers. But I think it's like, we don't, we don't 100% know that that's true. I mean, there's like a lot of maybe issues with the current version of the transformers, which is like the way attention works, the attention layers work, the amount of computers quadratic in the context sense, because you're like doing like an n squared operation on the attention blocks basically.[00:30:30] And obviously, you know, one of the things that everyone wants right now is infinite context. They wanna shove as much prop as possible in here. And the current version of what a transformer looks like is maybe not ideal. You might just end up burning a lot of flops on this when there are probably more efficient ways of doing it.[00:30:45] So I'm, I'm sure in the future there's gonna be tweaks to this. Yeah. Uh, but it is interesting that we found out interesting things of like, hey, bigger is pretty much always better. There are probably ways of making smaller models significantly better through better data. That is like definitely true. Um, And I think one of the cool things that the stack showed actually was they did a, like a, I think they did some ablation studies where they were like, Hey, what happens if we do, if we do decontamination of our data, what happens if we do de-duplication?[00:31:14] What happens if we do near dup of our data and how does the model get better? And they have like some compelling results that showcase data quality really matters here, but ultimately, Yeah, I think it is an interesting result that at 20 billion there's something happening. But I also think like some of these things in the future may look materially different than what they look like right now.[00:31:30] Hmm. Do you think[00:31:31] Alessio Fanelli: the token limitation is actually a real architectural limitation? Like if you think about the tokens need as kind of like atic, right? Like once you have. 50,000 tokens context, like 50,000 or infinite. For most use cases, it's like the same. Where do you think that number is, especially as you think about code, like some people have very large code bases, there's a lot.[00:31:53] Have you done any work there to figure out where the sweet[00:31:55] Varun Mohan: spot is? Yeah, look, I think what's gonna really end up happening is if people come up with a clever way and, and it, there was some result research that I believe came out of Stanford. I think the team from the Helm group, I think came out with some architecture that looks a little bit different than Transformers, and I'm sure something like this will work in the future.[00:32:13] What I think is always gonna happen is if you find a cheap way to embed context, people are gonna figure out a way to, to put as much as possible in because L LM so far have been like virtually stateless. So the only thing that they have beyond fine tuning is like just shoveling everything you can inside.[00:32:28] And there are some interesting papers, like retro, actually there are maybe some interesting pieces of thought like ideas that have come out recently. Yeah, let's go through them. So one of the really interesting ideas, I think is retro. It's this paper that came out of DeepMind and the idea is actually, let's say you send out, you send out, uh, a prompt.[00:32:44] Okay? Send out a prompt. You compute the burt embedding of that. And then you have this massive embedding database. And by massive, I'm not talking about like gigabytes, I'm talking about terabytes. Like you have, geez, you actually have 10 times the number of tokens as what was used to train the model. So like, let's say you had a model that was trained on a trillion tokens, you have a 10 trillion embed, uh, like embedding database.[00:33:04] And obviously Google has this because they have all content that ever existed in humanity and they have like the best data set and sort of, they were able to make one of these, uh, embedding databases. But the idea here, which is really cool, is you end. Taking your prompt, computing, the bird, embedding you find out the things that were nearby.[00:33:20] So you do roughly like a semantic search or an embedding search within that. And then you take those, you take the documents that were from those embeddings and you shove those in the model too, in what are called like cross chunked attention. So you like shove them in the model with it as well.[00:33:34] Suddenly now the model is able to take in external. Which is really exciting actually, because suddenly now you're able to get dynamic context in, and the model in some sense is deciding what that context is. It's not deciding it completely. In this case, because the Bert model in this case was actually frozen.[00:33:50] It wasn't trained with the retro model as well, but. The idea is you're somehow adding or augmenting context, which I think is like quite exciting. There's probably two futures. Either context becomes really cheap. Right now it's quadratic. Maybe there's a future where it becomes linear in the, in the size of the context, but the future might actually be the model itself dictates, Hey, I have this context.[00:34:10] You have this data source. Give me this. The model itself is going out into your database and like being like, I want this information, and this is kind of like. What Bing search is looking like. Right? Or bing chat is sort of looking like where it's like I, the model is probably, there's probably some model that's saying I want this information.[00:34:27] And that is getting augmented into the context. Now the model itself knows what context it sort of has and it can sort of like build a state machine of sort of what it needs. And that's probably what the future of this looks like. So you, you[00:34:37] swyx: predict monster embedding database[00:34:39] Varun Mohan: companies? Probably Monster embedding database companies or, yeah.[00:34:43] The model in some sense will need to talk to, Talk to these embedding databases. I'm actually not convinced that the current breed of embedding database companies are like ready for what the future sort of looks like. I think I'm just looking at their pricing, how much it costs per gigabyte and it's prohibitive at the scale we're talking about, like let's say you actually did want to host a 10 terabyte embedding database.[00:35:03] A lot of them were created, let's say two years ago, two, three years ago, where people were like, you know, embedding databases are small and they need to make the cost economics work. But maybe, yeah, there's probably gonna be a big workload there. I will just say for us, we will probably just build this in-house to start with, and that's because I think the technology probably isn't there.[00:35:20] And I think that the technology isn't there yet. Like waiting on point solutions to come up is a lot harder, um, than probably building it up. The way I, I like to think about this is probably the world looks on the LM space. Looks like how the early internet days were, where I think the value was accrued to probably like Google and Google needed to figure out all the crazy things to make their workload work.[00:35:41] And the reason why they weren't able to outsource is, is no one else was feeling the pain. ,[00:35:46] swyx: they're just solving their own pain points. They're just solving their own pain points. They're so far ahead of everyone else. Yes, yes. And just wait[00:35:50] Varun Mohan: for people to catch up. Yes. Yes. And that's maybe different than how things like Snowflake look where the interface has been decided for what SQL looks like 50 years ago.[00:35:58] And because of that, you can go out and build the best database and Yeah, like everyone's gonna be like, this doesn't make my beer taste better. And buy your database basically. That's[00:36:08] swyx: a great reference, by the way. Yeah. We have some friends of the, the pod that are working on embedding database, so we'll try to connect you Toroma[00:36:14] Varun Mohan: and see.[00:36:14] Yeah. Oh, I actually know Anton. I worked with him at Neuro. Oh. Although, there you go. Yeah. Uh, what do you, well, what do you think about, I mean,[00:36:20] swyx: so chromas pivoting towards an embedding[00:36:22] Varun Mohan: database. I think it's an interesting idea. I think it's an interesting idea. I wonder what the early set of workloads that.[00:36:27] They will hit our, and you know what the scaling requirements are. This is maybe the classic thing where like, the teams are great, but you need to pick a workload here that you care about the most. You could build anything. You could build anything. When you're an infrastructure company, you can go in, if I was selling, serving in for, I could build, serving for like linear aggression.[00:36:44] I could build this, but like, unless you hit the right niche for the end user, it's gonna be. . So I think it, I'm excited to see what comes out and if they're great, then we'll use it. Yeah.[00:36:54] swyx: I also like how you slowly equated yourself to Google there. Oh, we're not, we're not Google. You're, you're gonna be the Google of ai.[00:37:00] Varun Mohan: We're definitely, we're definitely not Google. But I was just saying in terms of like, if you look at like the style of companies that came out. Yeah. You know? Absolutely. Or maybe we should live in the cutting edge in[00:37:08] swyx: the future. Yeah. I think that's the pitch.[00:37:10] Varun Mohan: Okay, thanks for b***h us.[00:37:13] Alessio Fanelli: So you just mentioned the older vector embedding source are kind of not made for the L l M generation of compute size.[00:37:21] what does l LM ops look like? You know, which pieces need to be drastically different? Which ones can we recycle?[00:37:27] Varun Mohan: Yeah. One of the things that we've found, like in our own thing of building code that's been just shows how much is missing, and this is the thing where like, I don't know how much of this you can really outsource, which is like we needed to build eval infrastructure.[00:37:40] That means how do you build a great code? And there are things online like human eval, right? And uh, I was telling, which is the benchmark telling Sean about this, the idea of human eval is really neat for code. The idea is you provide a bunch of functions with Docstrings and the eval instead of being, did you predict next token?[00:37:56] It's like, did you generate the entire function and does the function run correctly against a bunch of unit tests? Right. And we've built more sophisticated evals to work on many languages, to work on more variety of code bases. One of the issues that ends up coming up with things like human eval is contam.[00:38:12] Because a lot of these, uh, things that train models end up training on all of GitHub GitHub itself has human eva, so they end up training on that. And then the numbers are tiny, though. It's gonna be tiny, right? But it doesn't matter if it's tiny because it'll just remember it. It'll remember that it's, it's not that it's that precise, but it will, it's like, it's basically like mixing your, your training and validation set.[00:38:32] It's like, oh, yeah, yeah, yeah, yeah. But we've seen cases where like online where someone is like, we have a code model that's like, they we're like, we did this one thing, and HU and human eval jumped a ton and we were just like, huh, did human eval get into your data set? Is that really what happened there?[00:38:46] But we've needed to build all this eval. And what is shown is data cleaning is massive, but data cleaning looks different by. Like code data cleaning is different than what is a high quality piece of code is probably different than what's a high quality legal document. Yeah. And then on top of that, how do you eval this?[00:39:01] How do you also train it at scale at whatever cost you really want to get? But those are things that the end user is either gonna need to solve or someone else is gonna need to solve for them. And I guess maybe one of the things I'm a little bearish on is if another company comes out and solves eval properly for a bunch of different verticals, what was the company that they were selling to really?[00:39:21] What were they really doing at that point? If they themselves were not eval for their own workload and all these other things? I think there are cases where, let's say for code where we probably couldn't outsource our eval, like we wouldn't be able to ship models internally if we didn't know how to eval, but it's clear that there's a lot of different things that people need to take.[00:39:38] Like, Hey, maybe there's an embedding piece. How large is this embedding database actually need to be? But hey, this does look very different than what classic ML ops probably did. Mm-hmm. . How[00:39:47] Alessio Fanelli: do you compare some of these models? Like when you're thinking about model upgrading and making changes, like what does the testing piece of it internally?[00:39:56] Yeah. For us look like.[00:39:56] Varun Mohan: For us, it's like old school AB testing. We've built like infrastructure to be able to say, ramp up users from one to 10 to. 50% and slowly roll things out. This is all classic software, uh, which[00:40:09] swyx: you do in-house. You don't, you don't buy any[00:40:10] Varun Mohan: services. We don't buy services for that.[00:40:13] There are good services, open source services that help you just don't need them. Uh, yeah, I think that's just like not the most complicated thing for us. Sure. Basically. Yeah. Uh, but I think in the future, maybe, we'll, obviously we use things like Google Analytics and all this other stuff, but Yeah. For things of ramping our models, finding out if they're actually better because the eval also doesn't tell the whole story because also for us, Even before generating the prompt, we do a lot of work.[00:40:36] And the only way to know that it's really good across all the languages that our users need to tell us that it's actually good. And, and they tell us by accepting completions. So, so GitHub[00:40:44] swyx: co-pilot, uh, the extension does this thing where they, they like, they'll set a timer and then within like five minutes, 10 minutes, 20 minutes, they'll check in to see if the code is still there.[00:40:54] I thought it was a[00:40:54] Varun Mohan: pretty creative way. It's, it's a very, it's honestly a very creative way. We do do things to see, like in the long term, if people did. Accept or write things that are roughly so because they could accept and then change their minds. They could accept and then change their minds. So we, we are mindful of, of things like that.[00:41:09] But for the most part, the most important metric is at the time, did they actually, did we generate value? And we want to know if that's true. And it's, it's kind of, it's honestly really hard to get signal unless you have like a non-trivial amount of usage, non-trivial, meaning you're getting, you're doing hundreds of thousands of completions, if not millions of completions.[00:41:25] That sounds like, oh wow. Like, that's like a very small amount. But like it's classic. Maybe like if you look at like when I used to be an intern at Quora, like, you know, now more than seven, eight years ago. When I was there, I like shipped a change and then Cora had like millions of daily actives and then it looked like it was good, and then a week later it was just like way worse.[00:41:43] And how is this possible? Like in a given hour we get like hundreds of thousands of interaction, just like, no, you just need way more data. So this is like one of those things where I think having users is like genuinely very valuable to us, basically. Users is all you need. . Yeah.[00:41:59] swyx: Um, by the way, since you brought out Quora, have you tried po any, any thoughts[00:42:03] Varun Mohan: on po I have not actually tried po I've not actually tried.[00:42:05] I[00:42:05] swyx: mean, it seems like a question answering website that's been around for 20 years or something. Would be very, would be very good at question answering. Yeah.[00:42:12] Varun Mohan: Also Adam, the ceo, is like incredibly brilliant. That guy is like insanely smart, so I'm sure they're gonna do,[00:42:18] swyx: they have accidentally built the perfect like data collection company for For qa.[00:42:22] Varun Mohan: Yeah. . It takes a certain kind of person to go and like cannibalize your original company like the in, I mean, it was kinda stagnant for like a few years. Yeah, that's probably true. That's[00:42:31] swyx: probably true. The observation is I feel like you have a bias to its domain specific. , whereas most research is skewed towards, uh, general models, general purpose models.[00:42:40] I don't know if there's like a, a deeper insight here that you wanna go into or, or not, but like, train on all the things, get all the data and you're like, no, no, no. Everyone needs like customized per task,[00:42:49] Varun Mohan: uh, data set. Yeah. I think I'm not gonna. Say that general intelligence is not good. You want a base model that's still really good and that's probably trained on normal text, like a lot of different content.[00:43:00] But I think probably one thing that old school machine learning, even though I'm like the kind of person that says a lot of old school machine learning is just gonna die, is that training on a high quality data set for your workload is, is always gonna yield better results and more, more predictable results.[00:43:15] And I think we are under no illusions that that's not the case. Basical. And[00:43:19] swyx: then the other observation is bandwidth and connectivity, uh, which is not something that people usually think about, but apparently is a, is a big deal. Apparently training agreed in the synchronous needs, high GPU coordination.[00:43:29] These are deleted notes from Sam Altman talking about how they think about training and I was like, oh yeah, that's an insight. And[00:43:34] Varun Mohan: you guys have the same thing. Yeah. So I guess for, for training, you're right in that it is actually nuts to think about how insane the networks are for NVIDIA's most recent hardware, it's.[00:43:46] For the H 100 boxes, you shove eight of these H 100 s on a. Between two nodes. The bandwidth is 3,200 gigabits a second, so 400 gigabytes a second between machines. That's like nuts when you just sit and think about it. That's like double the memory bandwidth of what a CPU has, but it's like between two machines.[00:44:04] On top of that, within the machine, they've created this, this fabric called envy link that allows you to communicate at ultra low latency. That's even lower than P C I E. If you're familiar, that's like the communication protocol. . Yeah, between like the CPU and the other devices or other P C I E devices.[00:44:21] All of this is to make sure that reductions are fast, low latency, and you don't need to think about it. And that's because like a lot of deep learning has sort of evolved. Uh, training has evolved to be synchronous in the OG days. There is a lot of analysis in terms of how good is asynchronous training, which is like, Hey, I have a node, it has a current state of the model.[00:44:39] It's gonna update that itself locally, and it'll like every once in a while, go to another machine and update the weights. But I think like everyone has converged to synchronous. I'm not exactly sure. There's not a lot of good research on asynchronous training right now. Or maybe there is an, I haven't read it.[00:44:52] It's just that there isn't as much research because people are just like, oh, synchronous works. Uh, and the hardware is continually upleveled to handle[00:44:59] swyx: that. Yeah. It was just un unintuitive to me cuz like the whole purpose of GPUs could train things. A lot of things in parallel. Yes.[00:45:05] Varun Mohan: But the crazy thing is also, maybe I can, I can give some dumb math here.[00:45:09] Sure. Here, which is that, uh, let's go with uh, G B T three, which is like 170 billion per. The optimizer state, so while you're training is 14 times the size of the model, so in this case, if it's like 170 billion parameters, it's probably, I'm not great at mental math here, but that's probably around 2.5 terabytes to just store the optimizer state.[00:45:30] That has gotta be sharded across a lot of machines. Like that is not a single gpu. Even if you take an H 100 with 80 gigs to just shard that much, that's like 40, at least 30 machines. So there's like something there where these things need to communicate with each other too.[00:45:44] swyx: You need to vertically scale horizontally.[00:45:46] Varun Mohan: Yeah. You gotta co-located, you gotta somehow feel like you have this massive, the, the ideal programming paradigm is you feel like you have this massive computer. That has no communication, you know, overhead at all, but it has like infinite computer and infinite memory bandwidth.[00:45:59] swyx: That's the AI cluster. Um, okay, well, uh, we want to head to the questions.[00:46:05] Alessio Fanelli: So favorite AI product that you are not[00:46:08] Varun Mohan: building? Yeah, I'm friends with some of the folks at Mid Journey and I really think the Mid Journey product is super cool, especially seeing how the team is iterating and the quality of generations. It consistently gets upleveled. I think it's like quite neat and I think internally at at exa functional, we've been trying out mid Journey for like random content to like generate images and stuff.[00:46:26] Does it bother[00:46:26] swyx: you that they have like a style. I don't know. It, it seems like they're hedging themselves into a particular, like you want mid journey art, you go there.[00:46:33] Varun Mohan: Yeah. It's a brand of art. Yeah, you're right. I think they do have a style, but it seems more predictably good for that style. Okay. So maybe that's too, so just get good at, uh, domain specific thing.[00:46:41] Yeah. Yeah. maybe. Maybe I, maybe I'm just selling, talking to a booker right now. . Yeah. Uh, okay.[00:46:46] swyx: Uh, next question. Uh, favorite AI people and[00:46:48] Varun Mohan: communities? Yeah, so I think I mentioned this before, but I think obviously the open. The opening eye folks are, are insane. Like we, we only have respect for them. But beyond that, I think Elu is a pretty special group.[00:46:59] Especially it's been now probably more than a year and a half since they released like G P T J, which was like back when open source G PT three Curri, which was comparable. And it wasn't like a model where like, It wasn't good. It was like comparable in terms of perplexity to GT three curity and it was trained by a university student actually, and it just showed that, you know, in the end, like I would say pedigree is great, but in if you have people that are motivated know how computers work and they're willing to just get their hands dirty, you can do crazy things and that was a crazy project that gave me more hope.[00:47:34] Decentral training being potentially pretty massive. But I think that was like a very cool thing where a bunch of people just got on Discord and were chatting and they were able to just turn this out. Yeah. I did[00:47:42] swyx: not know this until I looked in further into Luther, but it was not a formal organization.[00:47:45] Was a company was a startup. It's not, yeah. Bunch of guys on Discord.[00:47:48] Varun Mohan: They gotta you, they gotta keep you research grant and they somehow just wrote some codes. .[00:47:52] Alessio Fanelli: Yeah. Yeah. Listen to APAC with Connor, who's the person, and basically Open Eye at the time was like, we cannot release G P T because it's like too good and so bad.[00:48:01] And he was like, He actually said he was sick, so he couldn't leave home for like a, a few weeks. So it was like, what else am I gonna do? And ended up getting through the Google like research programs through his university and they were like, oh, we'll give you TPUs. And he was like, cool. And that's how, that's,[00:48:17] Varun Mohan: that's amazing.[00:48:18] So I came to you. I love the story. Yeah, it's a great story. .[00:48:21] Alessio Fanelli: So a year from now, what do you think people will be most surprised by[00:48:25] Varun Mohan: In ai? Yeah. I think the thing people will be most surprised by is, I think they, the models are gonna, More good at SP special tasks for sure, but even the existing models, I think people will come up with more creative ways of leveraging them to build like world class products.[00:48:39] I think that's just like human creativity is gonna go wild. It seems like Cha GBT has already kind of unleashed that. I think I'm just excited to see what the future of these products look like. I guess law was not something I expected in such a short, well,[00:48:51] swyx: totally expected. I, I, I was actually watching a different company that I thought was gonna be the winner, and then Harvey just came outta nowhere,[00:48:56] Oh, wow. Okay. Okay. Well that's, that's awesome. But yeah. So my, my takeaway from what you're saying is like, foundation models have kind of shot way too far ahead of the apps and people need to build[00:49:05] Varun Mohan: apps. Yes. I think people should be building apps, but I. The reality is the model is like probably at a state right now where it can do crazy enough things.[00:49:12] Uh, and I think great apps will, will come out of this. Yeah.[00:49:16] swyx: AI thing you would pay for if someone else built it personal or work.[00:49:20] Varun Mohan: I think if, if someone else built like a proper assistant, like a proper like fitness assistant, I would probably pay for that actually. I know that, that sounds weird, but someone that actually tells me like, how should I end up, like, you know, doing fitness today, I ended up injuring my knee from over biking.[00:49:35] I ended up biking like 150 miles a week and I ended up just injuring my knee outta nowhere. So, so you need, you need an app to tell you to exercise less. Exercise less, but tell me what my training regimen is. Uh, tell me what I should do to prepare for things. I know that this is like a big niche, but I think the fact that Strava is such a big group of people and like swyx is a big group of people, seems to suggest that I think a lot of people would be willing to pay for something like this.[00:49:57] Alessio Fanelli: what's one thing you want everyone to take away about AI and our[00:50:01] Varun Mohan: conversation? Probably the most important thing to take away is there's probably a lot out there if people continue to tinker. I think that's probably like the biggest takeaway I've had. Uh, and it's, you know, being a pure infrastructure company, I think like, uh, six to eight months ago, I think it was like very hard to watch everyone tinkering and us just, you know, building, building infrastructure.[00:50:22] But I think there's gonna be some crazy things that come out over the next year or. Um, excited to just see what that looks like. Awesome. Yeah, man. That's it. This was fantastic. Thanks so much. Thanks for coming. Get full access to Latent.Space at www.latent.space/subscribe
-
ChatGPT, GPT4 hype, and Building LLM-native products — with Logan Kilpatrick of OpenAI
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2023-02-23 17:53
We’re so glad to launch our first podcast episode with Logan Kilpatrick! This also happens to be his first public interview since joining OpenAI as their first Developer Advocate. Thanks Logan!Recorded in-person at the beautiful StudioPod studios in San Francisco. Full transcript is below the fold.Timestamps* 00:29: Logan’s path to OpenAI* 07:06: On ChatGPT and GPT3 API* 16:16: On Prompt Engineering* 20:30: Usecases and LLM-Native Products* 25:38: Risks and benefits of building on OpenAI* 35:22: OpenAI Codex* 42:40: Apple's Neural Engine* 44:21: Lightning RoundShow notes* Sam Altman’s interview with Connie Loizos* OpenAI Cookbook* OpenAI’s new Embedding Model* Cohere on Word and Sentence Embeddings* (referenced) What is AGI-hard?Lightning Rounds* Favorite AI Product: https://www.synthesia.io/* Favorite AI Community: MLOps * One year prediction: Personalized AI, https://civitai.com/* Takeaway: AI Revolution is here!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx writer editor of L Space Diaries. Hey.[00:00:20] swyx: Hey . Our guest today is Logan Kilpatrick. What I'm gonna try to do is I'm gonna try to introduce you based on what people know about you, and then you can fill in the blanks.[00:00:28] Introducing Logan[00:00:28] swyx: So you are the first. Developer advocate at OpenAI, which is a humongous achievement. Congrats. You're also the lead developer community advocate of the Julia language. I'm interested in a little bit of that and apparently as I've did a bit of research on you, you got into Julia through NASA where you interned and worked on stuff that's gonna land on the moon apparently.[00:00:50] And you are also working on computer vision at Apple. And had to sit at path, the eye as you fell down the machine learning rabbit hole. What should people know about you that's kind of not on your LinkedIn that like sort of ties together your interest[00:01:02] Logan Kilpatrick: in story? It's a good question. I think so one of the things that is on my LinkedIn that wasn't mentioned that's super near and dear to my heart and what I spend a lot of time in sort of wraps a lot of my open source machine learning developer advocacy experience together is supporting NumFOCUS.[00:01:17] And NumFOCUS is the nonprofit that helps enable a bunch of the open source scientific projects like Julia, Jupyter, Pandas, NumPy, all of those open source projects are. Facilitated legal and fiscally through NumFOCUS. So it's a very critical, important part of the ecosystem and something that I, I spend a bunch of my now more limited free time helping support.[00:01:37] So yeah, something that's, It's on my LinkedIn, but it's, it's something that's important to me. Well,[00:01:42] swyx: it's not as well known of a name, so maybe people kind of skip over it cuz they were like, I don't know what[00:01:45] Logan Kilpatrick: to do with this. Yeah. It's super interesting to see that too. Just one point of context for that is we tried at one point to get a Wikipedia page for non focus and it's, it's providing, again, the infrastructure for, it's like a hundred plus open source scientific projects and they're like, it's not notable enough.[00:01:59] I'm like, well, you know, there's something like 30 plus million developers around the world who use all these open source tools. It's like the foundation. All open source like science that happens. Every breakthrough in science is they discovered the black hole, the first picture of the black hole, all that stuff using numb focus tools, the Mars Rovers, NumFOCUS tools, and it's interesting to see like the disconnect between the nonprofit that supports those projects and the actual success of the projects themselves.[00:02:26] swyx: Well, we'll, we'll get a bunch of people focused on NumFOCUS and we'll get it on Wikipedia. That that is our goal. . That is the goal. , that is our shot. Is this something that you do often, which is you? You seem to always do a lot of community stuff. When you get into something, you're also, I don't know where this, where you find time for this.[00:02:42] You're also a conference chair for DjangoCon, which was last year as well. Do you fall down the rabbit hole of a language and then you look for community opportunities? Is that how you get into.[00:02:51] Logan Kilpatrick: Yeah, so the context for Django stuff was I'd actually been teaching and still am through Harvard's division of continuing education as a teaching fellow for a Django class, and had spent like two and a half years actually teaching students every semester, had a program in Django and realized that like it was kind of the one ecosystem or technical tool that I was using regularly that I wasn't actually contributing to that community.[00:03:13] So, I think sometime in 2021 like applied to be on the board of directors of the Django Events Foundation, north America, who helps run DjangoCon and was fortunate enough to join a support to be the chair of DjangoCon us and then just actually rolled off the board because of all the, all the craziness and have a lot less free time now.[00:03:32] And actually at PATH ai. Sort of core product was also using, was using Django, so it also had a lot of connections to work, so it was a little bit easier to justify that time versus now open ai. We're not doing any Django stuff unfortunately, so, or[00:03:44] swyx: Julia, I mean, should we talk about this? Like, are you defecting from Julia?[00:03:48] What's going on? ,[00:03:50] Logan Kilpatrick: it's actually felt a little bit strange recently because I, for the longest time, and, and happy to talk about this in the context of Apple as well, the Julie ecosystem was my outlet to do a lot of the developer advocacy, developer relations community work that I wanted to do. because again, at Apple I was just like training machine learning models.[00:04:07] Before that, doing software engineering at Apple, and even at Path ai, we didn't really have a developer product, so it wasn't, I was doing like advocacy work, but it wasn't like developer relations in the traditional sense. So now that I'm so deeply doing developer relations work at Open OpenAI, it's really difficult to.[00:04:26] Continue to have the energy after I just spent nine hours doing developer relations stuff to like go and after work do a bunch more developer relations stuff. So I'll be interested to see for myself like how I'm able to continue to do that work and I. The challenge is that it's, it's such critical, important work to happen.[00:04:43] Like I think the Julie ecosystem is so important. I think the language is super important. It's gonna continue to grow in, in popularity, and it's helping scientists and engineers solve problems they wouldn't otherwise be able to. So it's, yeah, the burden is on me to continue to do that work, even though I don't have a lot of time now.[00:04:58] And I[00:04:58] Alessio Fanelli: think when it comes to communities, the machine learning technical community, I think in the last six to nine months has exploded. You know, you're the first developer advocate at open ai, so I don't think anybody has a frame of reference on what that means. What is that? ? So , what do you, how did, how the[00:05:13] swyx: job, yeah.[00:05:13] How do you define the job? Yeah, let's talk about that. Your role.[00:05:16] Logan Kilpatrick: Yeah, it's a good question and I think there's a lot of those questions that actually still exist at OpenAI today. Like I think a lot of traditional developed by advocacy, at least like what you see on Twitter, which I think is what a lot of people's perception of developer advocacy and developer relations is, is like, Just putting out external content, going to events, speaking at conferences.[00:05:35] And I think OpenAI is very unique in the sense that, at least at the present moment, we have so much inbound interest that there's, there is no desire for us to like do that type of developer advocacy work. So it's like more from a developer experience point of view actually. Like how can we enable developers to be successful?[00:05:53] And that at the present moment is like building a strong foundation of documentation and things like that. And we had a bunch of amazing folks internally who were. Who were doing some of this work, but it really wasn't their full-time job. Like they were focused on other things and just helping out here and there.[00:06:05] And for me, my full-time job right now is how can we improve the documentation so that people can build the next generation of, of products and services on top of our api. And it's. Yeah. There's so much work that has to happen, but it's, it's, it's been a ton of fun so far. I find[00:06:20] swyx: being in developer relations myself, like, it's kind of like a fill in the blanks type of thing.[00:06:24] Like you go to where you, you're needed the most open. AI has no problem getting attention. It is more that people are not familiar with the APIs and, and the best practices around programming for large language models, which is a thing that did not exist three years ago, two years ago, maybe one year ago.[00:06:40] I don't know. When she launched your api, I think you launched Dall-E. As an API or I, I don't[00:06:45] Logan Kilpatrick: know. I dunno. The history, I think Dall-E was, was second. I think it was some of the, like GPT3 launched and then GPT3 launched and the API I think like two years ago or something like that. And then Dali was, I think a little more than a year ago.[00:06:58] And then now all the, the Chachi Beast ChatGPT stuff has, has blown it all outta the water. Which you have[00:07:04] swyx: a a wait list for. Should we get into that?[00:07:06] Logan Kilpatrick: Yeah. .[00:07:07] ChatGPT[00:07:07] Alessio Fanelli: Yeah. We would love to hear more about that. We were looking at some of the numbers you went. Zero to like a million users in five days and everybody, I, I think there's like dozens of ChatGPT API wrappers on GitHub that are unofficial and clearly people want the product.[00:07:21] Like how do you think about that and how developers can interact with it.[00:07:24] Logan Kilpatrick: It. It's absolutely, I think one of the most exciting things that I can possibly imagine to think about, like how much excitement there was around ChatGPT and now getting to hopefully at some point soon, put that in the hands of developers and see what they're able to unlock.[00:07:38] Like I, I think ChatGPT has been a tremendous success, hands down without a question, but I'm actually more excited to see what developers do with the API and like being able to build those chat first experiences. And it's really fascinating to see. Five years ago or 10 years ago, there was like, you know, all this like chatbot sort of mm-hmm.[00:07:57] explosion. And then that all basically went away recently, and the hype went to other places. And I think now we're going to be closer to that sort of chat layer and all these different AI chat products and services. And it'll be super interesting to see if that sticks or not. I, I'm not. , like I think people have a lot of excitement for ChatGPT right now, but it's not clear to me that that that's like the, the UI or the ux, even though people really like it in the moment, whether that will stand the test of time, I, I just don't know.[00:08:23] And I think we'll have to do a podcast in five years. Right. And check in and see whether or not people are still really enjoying that sort of conversational experience. I think it does make sense though cause like that's how we all interact and it's kind of weird that you wouldn't do that with AI products.[00:08:37] So we. and I think like[00:08:40] Alessio Fanelli: the conversational interface has made a lot of people, first, the AI to hallucinate, you know, kind of come up with things that are not true and really find all the edge cases. I think we're on the optimism camp, you know, like we see the potential. I think a lot of people like to be negative.[00:08:56] In your role, kind of, how do you think about evangelizing that and kind of the patience that sometimes it takes for these models to become.[00:09:03] Logan Kilpatrick: Yeah, I think what, what I've done is just continue to scream from the, the mountains that like ChatGPT has, current form is definitely a research preview. The model that underlies ChatGPT GPT 3.5 is not a research preview.[00:09:15] I think there's things that folks can do to definitely reduce the amount of hall hallucinations and hopefully that's something that over time I, I, again have full confidence that it'll, it'll solve. Yeah, there's a bunch of like interesting engineering challenges. you have to solve in order to like really fix that problem.[00:09:33] And I think again, people are, are very fixated on the fact that like in, you know, a few percentage points of the conversations, things don't sound really good. Mm-hmm. , I'm really more excited to see, like, again when the APIs and the Han developers like what are the interesting solutions that people come up with, I think there's a lot that can be explored and obviously, OpenAI can explore all them because we have this like one product that's using the api.[00:09:56] And once you get 10,000, a hundred thousand developers building on top of that, like, we'll see what are the different ways that people handle this. And I imagine there's a lot of low-hanging fruit solutions that'll significantly improve the, the amount of halluc hallucinations that are showing up. Talk about[00:10:11] swyx: building on top of your APIs.[00:10:13] Chat GPTs API is not out yet, but let's assume it is. Should I be, let's say I'm, I'm building. A choice between GP 3.5 and chat GPT APIs. As far as I understand, they are kind of comparable. What should people know about deciding between either of them? Like it's not clear to me what the difference is.[00:10:33] Logan Kilpatrick: It's a great question.[00:10:35] I don't know if there's any, if we've made any like public statements about like what the difference will be. I think, I think the point is that the interface for the Chachi B API will be like conversational first, and that's not the case now. If you look at text da Vinci oh oh three, like you, you just put in any sort of prompt.[00:10:52] It's not really built from the ground up to like keep the context of a conversation and things like that. And so it's really. Put in some sort of prompt, get a response. It's not always designed to be in that sort of conversational manner, so it's not tuned in that way. I think that's the biggest difference.[00:11:05] I think, again, the point that Sam made in a, a strictly the strictly VC talk mm-hmm. , which was incredible and I, I think that that talk got me excited and my, which, which part? The whole thing. And I think, I haven't been at open AI that long, so like I didn't have like a s I obviously knew who Sam was and had seen a bunch of stuff, but like obviously before, a lot of the present craziness with Elon Musk, like I used to think Elon Musk seemed like a really great guy and he was solving all these really important problems before all the stuff that happened.[00:11:33] That's a hot topic. Yeah. The stuff that happened now, yeah, now it's much more questionable and I regret having a Tesla, but I, I think Sam is actually. Similar in the sense that like he's solving and thinking about a lot of the same problems that, that Elon, that Elon is still today. But my take is that he seems like a much more aligned version of Elon.[00:11:52] Like he's, he's truly like, I, I really think he cares deeply about people and I think he cares about like solving the problems that people have and wants to enable people. And you can see this in the way that he's talked about how we deploy models at OpenAI. And I think you almost see Tesla in like the completely opposite end of the spectrum, where they're like, whoa, we.[00:12:11] Put these 5,000 pound machines out there. Yeah. And maybe they'll run somebody over, maybe they won't. But like it's all in the interest of like advancement and innovation. I think that's really on the opposite end of the spectrum of, of what open AI is doing, I think under Sam's leadership. So it's, it's interesting to see that, and I think Sam said[00:12:30] Alessio Fanelli: that people could have built Chen g p t with what you offered like six, nine months ago.[00:12:35] I[00:12:35] swyx: don't understand. Can we talk about this? Do you know what, you know what we're talking about, right? I do know what you're talking about. da Vinci oh three was not in the a p six months before ChatGPT. What was he talking about? Yeah.[00:12:45] Logan Kilpatrick: I think it's a little bit of a stretch, but I do think that it's, I, I think the underlying principle is that.[00:12:52] The way that it, it comes back to prompt engineering. The way that you could have engineered, like the, the prompts that you were put again to oh oh three or oh oh two. You would be able to basically get that sort of conversational interface and you can do that now. And, and I, you know, I've seen tutorials.[00:13:05] We have tutorials out. Yep. No, we, I mean, we, nineties, we have tutorials in the cookbook right now in on GitHub. We're like, you can do this same sort of thing. And you just, it's, it's all about how you, how you ask for responses and the way you format data and things like that. It. The, the models are currently only limited by what people are willing to ask them to do.[00:13:24] Like I really do think that, yeah, that you can do a lot of these things and you don't need the chat CBT API to, to build that conversational layer. That is actually where I[00:13:33] swyx: feel a little bit dumb because I feel like I don't, I'm not smart enough to think of new things to ask the models. I have to see an example and go, oh, you can do that.[00:13:43] All right, I'm gonna do that for now. You know, and, and that's why I think the, the cookbook is so important cuz it's kind of like a compendium of things we know about the model that you can ask it to do. I totally[00:13:52] Logan Kilpatrick: agree and I think huge shout out to the, the two folks who I work super closely with now on the cookbook, Ted and Boris, who have done a lot of that work and, and putting that out there and it's, yeah, you see number one trending repo on, on GitHub and it was super, like when my first couple of weeks at Open ai, super unknown, like really, we were only sort of directing our customers to that repo.[00:14:13] Not because we were trying to hide it or anything, but just because. It was just the way that we were doing things and then all of a sudden it got picked up on GitHub trending and a bunch of tweets went viral, showing the repo. So now I think people are actually being able to leverage the tools that are in there.[00:14:26] And, and Ted's written a bunch of amazing tutorials, Boris, as well. So I think it's awesome that more people are seeing those. And from my perspective, it's how can we take those, make them more accessible, give them more visibility, put them into the documentation, and I don't think that that connection right now doesn't exist, which I'm, I'm hopeful we'll be able to bridge those two things.[00:14:44] swyx: Cookbook is kind of a different set of documentation than API docs, and I think there's, you know, sort of existing literature about how you document these things and guide developers the right way. What, what I, what I really like about the cookbook is that it actually cites academic research. So it's like a nice way to not read the paper, but just read the conclusions of the paper ,[00:15:03] Logan Kilpatrick: and, and I think that's, that's a shout out to Ted and Boris cuz I, I think they're, they're really smart in that way and they've done a great job of finding the balance and understanding like who's actually using these different tools.[00:15:13] So, . Yeah.[00:15:15] swyx: You give other people credit, but you should take credit for yourself. So I read your last week you launched some kind of documentation about rate limiting. Yeah. And one of my favorite things about reading that doc was seeing examples of, you know, you were, you're telling people to do exponential back off and, and retry, but you gave code examples with three popular libraries.[00:15:32] You didn't have to do that. You could have just told people, just figure it out. Right. But you like, I assume that was you. It wasn't.[00:15:38] Logan Kilpatrick: So I think that's the, that's, I mean, I'm, I'm helping sort of. I think there's a lot of great stuff that people have done in open ai, but it was, we have the challenge of like, how can we make that accessible, get it into the documentation and still have that high bar for what goes into the doc.[00:15:51] So my role as of recently has been like helping support the team, building that documentation first culture, and supporting like the other folks who actually are, who wrote that information. The information was actually already in. Help center but it out. Yeah, it wasn't in the docs and like wasn't really focused on, on developers in that sense.[00:16:10] So yeah. I can't take the, the credit for the rate limit stuff either. , no, this[00:16:13] swyx: is all, it's part of the A team, that team effort[00:16:16] On Prompt Engineering[00:16:16] Alessio Fanelli: I was reading on Twitter, I think somebody was saying in the future will be kind of like in the hair potter word. People have like the spell book, they pull it out, they do all the stuff in chat.[00:16:24] GP z. When you talk with customers, like are they excited about doing prompt engineering and kind of getting a starting point or do they, do they wish there was like a better interface? ?[00:16:34] Logan Kilpatrick: Yeah, that's a good question. I think prompt engineering is so much more of an art than a science right now. Like I think there are like really.[00:16:42] Systematic things that you can do and like different like approaches and designs that you can take, but really it's a lot of like, you kind of just have to try it and figure it out. And I actually think that this remains to be one of the challenges with large language models in general, and not just head open ai, but for everyone doing it is that it's really actually difficult to understand what are the capabilities of the model and how do I get it to do the things that I wanted to do.[00:17:05] And I think that's probably where a lot of folks need to do like academic research and companies need to invest in understanding the capabilities of these models and the limitations because it's really difficult to articulate the capabilities of a model without those types of things. So I'm hopeful that, and we're shipping hopefully some new updated prompt engineering stuff.[00:17:24] Cause I think the stuff we have on the website is old, and I think the cookbook actually has a little bit more up-to-date stuff. And so hopefully we'll ship some new prompt engineering stuff in the, in the short term. I think dispel some of the myths and rumors, but like I, it's gonna continue to be like a, a little bit of a pseudoscience, I would imagine.[00:17:41] And I also think that the whole prompt engineering being like a job in the future meme, I think is, I think it's slightly overblown. Like I think at, you see this now actually with like, there's tools that are showing up and I forgot what the, I just saw went on Twitter. The[00:17:57] swyx: next guest that we are having on this podcast, Lang.[00:17:59] Yeah. Yeah.[00:18:00] Logan Kilpatrick: Lang Chain and Harrison on, yeah, there's a bunch of repos too that like categorize and like collect all the best prompts that you can put into chat. For example, and like, that's like the people who are, I saw the advertisement for someone to be like a prompt engineer and it was like a $350,000 a year.[00:18:17] Mm-hmm. . Yeah, that was, that was philanthropic. Yeah, so it, it's just unclear to me like how, how sustainable stuff like that is. Cuz like, once you figure out the interesting prompts and like right now it's kind of like the, the Wild West, but like in a year you'll be able to sort of categorize all those and then people will be able to find all the good ones that are relevant for what they want to do.[00:18:35] And I think this goes back to like, having the examples is super important and I'm, I'm with you as well. Like every time I use Dall-E the little. While it's rendering the image, it gives you like a suggestion of like how you should ask for the art to be generated. Like do it in like a cyberpunk format. Do it in a pixel art format.[00:18:53] Et cetera, et cetera, and like, I really need that. I'm like, I would never come up with asking for those things had it not prompted me to like ask it that way. And now I always ask for pixel art stuff or cyberpunk stuff and it looks so cool. That's what I, I think,[00:19:06] swyx: is the innovation of ChatGPT as a format.[00:19:09] It reduces. The need for getting everything into your prompt in the first try. Mm-hmm. , it takes it from zero shot to a few shot. If, if, if that, if prompting as, as, as shots can be concerned.[00:19:21] Logan Kilpatrick: Yeah. , I think that's a great perspective and, and again, this goes back to the ux UI piece of it really being sort of the differentiating layer from some of the other stuff that was already out there.[00:19:31] Because you could kind of like do this before with oh oh three or something like that if you just made the right interface and like built some sort of like prompt retry interface. But I don't think people were really, were really doing that. And I actually think that you really need that right now. And this is the, again, going back to the difference between like how you can use generative models versus like large scale.[00:19:53] Computer vision systems for self-driving cars, like the, the answer doesn't actually need to be right all the time. That's the beauty of, of large language models. It can be wrong 50% of the time and like it doesn't really cost you anything to like regenerate a new response. And there's no like, critical safety issue with that, so you don't need those.[00:20:09] I, I keep seeing these tweets about like, you need those like 99.99% reliability and like the three nines or whatever it is. Mm-hmm. , but like you really don't need that because the cost of regenerating the prop is again, almost, almost. I think you tweeted a[00:20:23] Alessio Fanelli: couple weeks ago that the average person doesn't yet fully grasp how GBT is gonna impact human life in the next four, five years.[00:20:30] Usecases and LLM-Native Products[00:20:30] Alessio Fanelli: I think you had an example in education. Yeah. Maybe touch on some of these. Example of non-tech related use cases that are enabling, enabled by C G B[00:20:38] T.[00:20:39] Logan Kilpatrick: I'm so excited and, and there's a bunch of other like random threads that come to my mind now. I saw a thread and, and our VP of product was, Peter, was, was involved in that thread as well, talking about like how the use of systems like ChatGPT will unlock like pretty almost low to zero cost access to like mental health services.[00:20:59] You know, you can imagine like the same use case for education, like really personalized tutors and like, it's so crazy to think about, but. The technology is not actually , like it's, it's truly like an engineering problem at this point of like somebody using one of these APIs to like build something like that and then hopefully the models get a little bit better and make it, make it better as well.[00:21:20] But like it, I have no doubt in my mind that three years from now that technology will exist for every single student in the world to like have that personalized education experience, have a pr, have a chat based experience where like they'll be able. Ask questions and then the curriculum will just evolve and be constructed for them in a way that keeps, I think the cool part is in a way that keeps them engaged, like it doesn't have to be sort of like the same delivery of curriculum that you've always seen, and this now supplements.[00:21:49] The sort of traditional education experience in the sense of, you know, you don't need teachers to do all of this work. They can really sort of do the thing that they're amazing at and not spend time like grading assignments and all that type of stuff. Like, I really do think that all those could be part of the, the system.[00:22:04] And same thing, I don't know if you all saw the the do not pay, uh, lawyer situation, say, I just saw that Twitter thread, I think yesterday around they were going to use ChatGPT in the courtroom and basically I think it was. California Bar or the Bar Institute said that they were gonna send this guy to prison if he brought, if he put AirPods in and started reading what ChatGPT was saying to him.[00:22:26] Yeah.[00:22:26] swyx: To give people the context, I think, like Josh Browder, the CEO of Do Not Pay, was like, we will pay you money to put this AirPod into your ear and only say what we tell you to say fr from the large language model. And of course the judge was gonna throw that out. I mean, I, I don't see how. You could allow that in your court,[00:22:42] Logan Kilpatrick: Yeah, but I, I really do think that, like, the, the reality is, is that like, again, it's the same situation where the legal spaces even more so than education and, and mental health services, is like not an accessible space. Like every, especially with how like overly legalized the United States is, it's impossible to get representation from a lawyer, especially if you're low income or some of those things.[00:23:04] So I'm, I'm optimistic. Those types of services will exist in the future. And you'll be able to like actually have a, a quality defense representative or just like some sort of legal counsel. Yeah. Like just answer these questions, what should I do in this situation? Yeah. And I like, I have like some legal training and I still have those same questions.[00:23:22] Like I don't know what I would do in that situation. I would have to go and get a lawyer and figure that out. And it's, . It's tough. So I'm excited about that as well. Yeah.[00:23:29] Alessio Fanelli: And when you think about all these vertical use cases, do you see the existing products implementing language models in what they have?[00:23:35] Or do you think we're just gonna see L L M native products kind of come to market and build brand[00:23:40] Logan Kilpatrick: new experiences? I think there'll be a lot of people who build the L l M first experience, and I think that. At least in the short term, those are the folks who will have the advantage. I do think that like the medium to long term is again, thinking about like what is your moat for and like again, and everyone has access to, you know, ChatGPT and to the different models that we have available.[00:24:05] So how can you build a differentiated business? And I think a lot of it actually will come down to, and this is just the true and the machine learning world in general, but having. Unique access to data. So I think if you're some company that has some really, really great data about the legal space or about the education space, you can use that and be better than your competition by fine tuning these models or building your own specific LLMs.[00:24:28] So it'll, it'll be interesting to see how that plays out, but I do think that. from a product experience, it's gonna be better in the short term for people who build the, the generative AI first experience versus people who are sort of bolting it onto their mm-hmm. existing product, which is why, like, again, the, the Google situation, like they can't just put in like the prompt into like right below the search bar.[00:24:50] Like, it just, it would be a weird experience and, and they have to sort of defend that experience that they have. So it, it'll be interesting to see what happens. Yeah. Perplexity[00:24:58] swyx: is, is kind of doing that. So you're saying perplexity will go Google ?[00:25:04] Logan Kilpatrick: I, I think that perplexity has a, has a chance in the short term to actually get more people to try the product because it's, it's something different I think, whether they can, I haven't actually used, so I can't comment on like that experience, but like I think the long term is like, How can they continue to differentiate?[00:25:21] And, and that's really the focus for like, if you're somebody building on these models, like you have to be, your first thought should be, how do I build a differentiated business? And if you can't come up with 10 reasons that you can build a differentiated business, you're probably not gonna succeed in, in building something that that stands the test of time.[00:25:37] Yeah.[00:25:37] Risks and benefits of building on OpenAI[00:25:37] swyx: I think what's. As a potential founder or something myself, like what's scary about that is I would be building on top of open ai. I would be sending all my stuff to you for fine tuning and embedding and what have you. By the way, fine tuning, embedding is their, is there a third one? Those are the main two that I know of.[00:25:55] Okay. And yeah, that's the risk. I would be a open AI API reseller.[00:26:00] Logan Kilpatrick: Yeah. And, and again, this, this comes back down to like having a clear sense of like how what you're building is different. Like the people who are just open AI API resellers, like, you're not gonna, you're not gonna have a successful business doing that because everybody has access to the Yeah.[00:26:15] Jasper's pretty great. Yeah, Jasper's pretty great because I, I think they've done a, they've, they've been smart about how they've positioned the product and I was actually a, a Jasper customer before I joined OpenAI and was using it to do a bunch of stuff. because the interface was simple because they had all the sort of customized, like if you want for like a response for this sort of thing, they'd, they'd pre-done that prompt engineering work for us.[00:26:39] I mean, you could really just like put in some exactly what you wanted and then it would make that Amazon product description or whatever it is. So I think like that. The interface is the, the differentiator for, for Jasper. And again, whether that send test time, hopefully, cuz I know they've raised a bunch of money and have a bunch of employees, so I'm, I'm optimistic for them.[00:26:58] I think that there's enough room as well for a lot of these companies to succeed. Like it's not gonna, the space is gonna get so big so quickly that like, Jasper will be able to have a super successful business. And I think they are. I just saw some, some tweets from the CEO the other day that I, I think they're doing, I think they're doing well.[00:27:13] Alessio Fanelli: So I'm the founder of A L L M native. I log into open ai, there's 6 million things that I can do. I'm on the playground. There's a lot of different models. How should people think about exploring the surface area? You know, where should they start? Kind of like hugging the go deeper into certain areas.[00:27:30] Logan Kilpatrick: I think six months ago, I think it would've been a much different conversation because people hadn't experienced ChatGPT before.[00:27:38] Now that people have experienced ChatGPT, I think there's a lot more. Technical things that you should start looking into and, and thinking about like the differentiators that you can bring. I still think that the playground that we have today is incredible cause it does sort of similar to what Jasper does, which is like we have these very focused like, you know, put in a topic and we'll generate you a summary, but in the context of like explaining something to a second grader.[00:28:03] So I think all of those things like give a sense, but we only have like 30 on the website or something like that. So really doing a lot of exploration around. What is out there? What are the different prompts that you can use? What are the different things that you can build on? And I'm super bullish on embeddings, like embed everything and that's how you can build cool stuff.[00:28:20] And I keep seeing all these Boris who, who I talked about before, who did a bunch of the cookbook stuff, tweeted the other day that his like back of the hand, back of the napkin math, was that 50 million bucks you can embed the whole internet. I'm like, Some companies gonna spend the 50 million and embed the whole internet and like, we're gonna find out what that product looks like.[00:28:40] But like, there's so many cool things that you could do if you did have the whole internet embedded. Yeah, and I, I mean, I wouldn't be surprised if Google did that cuz 50 million is a drop in the bucket and they already have the whole internet, so why not embed it?[00:28:52] swyx: Can can I ask a follow up question on that?[00:28:54] Cuz I am just learning about embeddings myself. What makes OpenAI’s embeddings different from other embeddings? If, if there's like, It's okay if you don't have the, the numbers at hand, but I'm just like, why should I use open AI emitting versus others? I[00:29:06] Logan Kilpatrick: don't understand. Yeah, that's a really good question.[00:29:08] So I'm still ramping up on my understanding of embeddings as well. So the two things that come to my mind, one, going back to the 50 million to embed the whole internet example, it's actually just super cheap. I, I don't know the comparisons of like other prices, but at least from what I've seen people talking about on Twitter, like the embeddings that that we have in the API is just like significantly cheaper than a lot of other c.[00:29:30] Embeddings. Also the accuracy of some of the benchmarks that are like, Sort of academic benchmarks to use in embeddings. I know at least I was just looking back through the blog post from when we announced the new text embedding model, which is what Powers embeddings and it's, yeah, the, on those metrics, our API is just better.[00:29:50] So those are the those. I'll go read it up. Yeah, those are the two things. It's a good. It's a good blog post to read. I think the most recent one that came out, but, and also the original one from when we first announced the Embeddings api, I think also was a, it had, that one has a little bit more like context around if you're trying to wrap your head around embeddings, how they work.[00:30:06] That one has the context, the new one just has like the fancy new stuff and the metrics and all that kind of stuff.[00:30:11] swyx: I would shout a hugging face for having really good content around what these things like foundational concepts are. Because I was familiar with, so, you know, in Python you have like text tove, my first embedding as as a, as someone getting into nlp.[00:30:24] But then developing the concept of sentence embeddings is, is as opposed to words I think is, is super important. But yeah, it's an interesting form of lock in as a business because yes, I'm gonna embed all my source data, but then every inference needs an embedding as. . And I think that is a risk to some people, because I've seen some builders should try and build on open ai, call that out as, as a cost, as as like, you know, it starts to add a cost to every single query that you, that you[00:30:48] Logan Kilpatrick: make.[00:30:49] Yeah. It'll be interesting to see how it all plays out, but like, my hope is that that cost isn't the barrier for people to build because it's, it's really not like the cost for doing the incremental like prompts and having them embedded is, is. Cent less than cents, but[00:31:06] swyx: cost I, I mean money and also latency.[00:31:08] Yeah. Which is you're calling the different api. Yeah. Anyway, we don't have to get into that.[00:31:13] Alessio Fanelli: No, but I think embeds are a good example. You had, I think, 17 versions of your first generation, what api? Yeah. And then you released the second generation. It's much cheaper, much better. I think like the word on the street is like when GPT4 comes out, everything else is like trash that came out before it.[00:31:29] It's got[00:31:30] Logan Kilpatrick: 100 trillion billion. Exactly. Parameters you don't understand. I think Sam has already confirmed that those are, those are not true . The graphics are not real. Whatever you're seeing on Twitter about GPT4, you're, I think the direct quote was, you're begging to be disappointed by continuing to, to put that hype out.[00:31:47] So[00:31:48] Alessio Fanelli: if you're a developer building on these, What's kind of the upgrade path? You know, I've been building on Model X, now this new model comes out. What should I do to be ready to move on?[00:31:58] Logan Kilpatrick: Yeah. I think all of these types of models folks have to think about, like there will be trade offs and they'll also be.[00:32:05] Breaking changes like any other sort of software improvement, like things like the, the prompts that you were previously expecting might not be the prompts that you're seeing now. And you can actually, you, you see this in the case of the embeddings example that you just gave when we released Tex embeddings, ADA oh oh two, ada, ada, whichever it is oh oh two, and it's sort of replaced the previous.[00:32:26] 16 first generation models, people went through this exact experience where like, okay, I need to test out this new thing, see how it works in my environment. And I think that the really fascinating thing is that there aren't, like the tools around doing this type of comparison don't exist yet today. Like if you're some company that's building on lms, you sort of just have to figure it out yourself of like, is this better in my use case?[00:32:49] Is this not better? In my use case, it's, it's really difficult to tell because the like, Possibilities using generative models are endless. So I think folks really need to focus on, again, that goes back to how to build a differentiated business. And I think it's understanding like what is the way that people are using your product and how can you sort of automate that in as much way and codify that in a way that makes it clear when these different models come up, whether it's open AI or other companies.[00:33:15] Like what is the actual difference between these and which is better for my use case because the academic be. It'll be saturated and people won't be able to use them as a point of comparison in the future. So it'll be important to think about. For your specific use case, how does it differentiate?[00:33:30] swyx: I was thinking about the value of frameworks or like Lang Chain and Dust and what have you out there.[00:33:36] I feel like there is some value to building those frameworks on top of OpenAI’s APIs. It kind of is building what's missing, essentially what, what you guys don't have. But it's kind of important in the software engineering sense, like you have this. Unpredictable, highly volatile thing, and you kind of need to build a stable foundation on top of it to make it more predictable, to build real software on top of it.[00:33:59] That's a super interesting kind of engineering problem. .[00:34:03] Logan Kilpatrick: Yeah, it, it is interesting. It's also the, the added layer of this is that the large language models. Are inherently not deterministic. So I just, we just shipped a small documentation update today, which, which calls this out. And you think about APIs as like a traditional developer experience.[00:34:20] I send some response. If the response is the same, I should get the same thing back every time. Unless like the data's updating and like a, from like a time perspective. But that's not the, that's not the case with the large language models, even with temperature zero. Mm-hmm. even with temperature zero. Yep.[00:34:34] And that's, Counterintuitive part, and I think someone was trying to explain to me that it has to do with like Nvidia. Yeah. Floating points. Yes. GPU stuff. and like apparently the GPUs are just inherently non-deterministic. So like, yes, there's nothing we can do unless this high Torch[00:34:48] swyx: relies on this as well.[00:34:49] If you want to. Fix this. You're gonna have to tear it all down. ,[00:34:53] Logan Kilpatrick: maybe Nvidia, we'll fix it. I, I don't know, but I, I think it's a, it's a very like, unintuitive thing and I don't think that developers like really get that until it happens to you. And then you're sort of scratching your head and you're like, why is this happening?[00:35:05] And then you have to look it up and then you see all the NVIDIA stuff. Or hopefully our documentation makes it more clear now. But hopefully people, I also think that's, it's kinda the cool part as well. I don't know, it's like, You're not gonna get the same stuff even if you try to.[00:35:17] swyx: It's a little spark of originality in there.[00:35:19] Yeah, yeah, yeah, yeah. The random seed .[00:35:22] OpenAI Codex[00:35:22] swyx: Should we ask about[00:35:23] Logan Kilpatrick: Codex?[00:35:23] Alessio Fanelli: Yeah. I mean, I love Codex. I use it every day. I think like one thing, sometimes the code is like it, it's kinda like the ChatGPT hallucination. Like one time I asked it to write up. A Twitter function, they will pull the bayou of this thing and it wrote the whole thing and then the endpoint didn't exist once I went to the Twitter, Twitter docs, and I think like one, I, I think there was one research that said a lot of people using Co Palace, sometimes they just auto complete code that is wrong and then they commit it and it's a, it's a big[00:35:51] Logan Kilpatrick: thing.[00:35:51] swyx: Do you secure code as well? Yeah, yeah, yeah, yeah. I saw that study.[00:35:54] Logan Kilpatrick: How do[00:35:54] Alessio Fanelli: you kind of see. Use case evolving. You know, you think, like, you obviously have a very strong partnership with, with Microsoft. Like do you think Codex and VS code will just keep improving there? Do you think there's kind of like a. A whole better layer on top of it, which is from the scale AI hackathon where the, the project that one was basically telling the l l m, you're not the back end of a product[00:36:16] And they didn't even have to write the code and it's like, it just understood. Yeah. How do you see the engineer, I, I think Sean, you said copilot is everybody gets their own junior engineer to like write some of the code and then you fix it For me, a lot of it is the junior engineer gets a senior engineer to actually help them write better code.[00:36:32] How do you see that tension working between the model and the. It'll[00:36:36] Logan Kilpatrick: be really interesting to see if there's other, if there's other interfaces to this. And I think I've actually seen a lot of people asking, like, it'd be really great if I had ChatGPT and VS code because in, in some sense, like it can, it's just a better, it's a better interface in a lot of ways to like the, the auto complete version cuz you can reprompt and do, and I know Via, I know co-pilot actually has that, where you can like click and then give it, it'll like pop up like 10 suggested.[00:36:59] Different options instead of brushes. Yeah, copilot labs, yeah. Instead of the one that it's providing. And I really like that interface, but again, this goes back to. I, I do inherently think it'll get better. I think it'll be able to do a lot, a lot more of the stuff as the models get bigger, as they have longer context as they, there's a lot of really cool things that will end up coming out and yeah, I don't think it's actually very far away from being like, much, much better.[00:37:24] It'll go from the junior engineer to like the, the principal engineer probably pretty quickly. Like I, I don't think the gap is, is really that large between where things are right now. I think like getting it to the point. 60% of the stuff really well to get it to do like 90% of the stuff really well is like that's within reach in the next, in the next couple of years.[00:37:45] So I'll be really excited to see, and hopefully again, this goes back to like engineers and developers and people who aren't thinking about how to integrate. These tools, whether it's ChatGPT or co-pilot or something else into their workflows to be more efficient. Those are the people who I think will end up getting disrupted by these tools.[00:38:02] So figuring out how to make yourself more valuable than you are today using these tools, I think will be super important for people. Yeah.[00:38:09] Alessio Fanelli: Actually use ChatGPT to debug, like a react hook the other day. And then I posted in our disc and I was like, Hey guys, like look, look at this thing. It really helped me solve this.[00:38:18] And they. That's like the ugliest code I've ever seen. It's like, why are you doing that now? It's like, I don't know. I'm just trying to get[00:38:24] Logan Kilpatrick: this thing to work and I don't know, react. So I'm like, that's the perfect, exactly, that's the perfect solution. I, I did this the other day where I was looking at React code and like I have very briefly seen React and run it like one time and I was like, explain how this is working.[00:38:38] So, and like change it in this way that I want to, and like it was able to do that flawlessly and then I just popped it in. It worked exactly like I. I'll give a[00:38:45] swyx: little bit more context cause I was, I was the guy giving you feedback on your code and I think this is a illustrative of how large language models can sort of be more confident than they should be because you asked it a question which is very specific on how to improve your code or fix your code.[00:39:00] Whereas a real engineer would've said, we've looked at your code and go, why are you doing it at at all? Right? So there's a sort of sycophantic property of martial language. Accepts the basis of your question, whereas a real human might question your question. Mm-hmm. , and it was just not able to do that. I mean, I, I don't see how he could do that.[00:39:17] Logan Kilpatrick: Yeah. It's, it's interesting. I, I saw another example of this the other day as well with some chatty b t prompt and I, I agree. It'll be interesting to see if, and again, I think not to, not to go back to Sam's, to Sam's talk again, but like, he, he talked real about this, and I think this makes a ton of sense, which is like you should be able to have, and this isn't something that that exists right now, but you should be able to have the model.[00:39:39] Tuned in the way that you wanna interact with. Like if you want a model that sort of questions what you're asking it to do, like you should be able to have that. And I actually don't think that that's as far away as like some of the other stuff. Um, It, it's a very possible engineering problem to like have the, to tune the models in that way and, and ask clarifying questions, which is even something that it doesn't do right now.[00:39:59] It'll either give you the response or it won't give you the response, but it'll never say like, Hey, what do you mean by this? Which is super interesting cuz that's like we spend as humans, like 50% of our conversational time being like, what do you mean by that? Like, can you explain more? Can you say it in a different way?[00:40:14] And it's, it's fascinating that the model doesn't do that right now. It's, it's interesting.[00:40:20] swyx: I have written a piece on sort of what AGI hard might be, which is the term that is being thrown around as like a layer of boundary for what is, what requires an A real AGI to do and what, where you might sort of asymptotically approach.[00:40:33] So, What people talk about is essentially a theory of mind, developing a con conception of who I'm talking to and persisting that across sessions, which essentially ChatGPT or you know, any, any interface that you build on top of GPT3 right now would not be able to do. Right? Like, you're not persisting you, you are persisting that history, but you don't, you're not building up a conception of what you know and what.[00:40:54] I should fill in the blanks for you or where I should question you. And I think that's like the hard thing to understand, which is what will it take to get there? Because I think that to me is the, going back to your education thing, that is the biggest barrier, which is I, the language model doesn't have a memory or understanding of what I know.[00:41:11] and like, it's, it's too much to tell them what I don't know. Mm-hmm. , there's more that I don't know than I, than I do know . I think the cool[00:41:16] Logan Kilpatrick: part will be when, when you're able to, like, imagine you could upload all of the, the stuff that you've ever done, all the texts, the work that you've ever done before, and.[00:41:27] The model can start to understand, hey, what are the, what are the conceptual gaps that this person has based on what you've said, based on what you've done? I think that would be really interesting. Like if you can, like I have good notes on my phone and I can still go back to see all of the calculus classes that I took and I could put in all my calculus notebooks and all the assignments and stuff that I did in, in undergrad and grad school, and.[00:41:50] basically be like, Hey, here are the gaps in your understanding of calculus. Go and do this right now. And I think that that's in the education space. That's exactly what will end up happening. You'll be able to put in all this, all the work that you've done. It can understand those ask and then come up with custom made questions and prompts and be like, Hey, how, you know, explain this concept to me and if it.[00:42:09] If you can't do that, then it can sort of put that into your curriculum. I think like Khan Academy as an example, already does some of this, like personalized learning. You like take assessments at the beginning of every Khan Academy model module, and it'll basically only have you watch the videos and do the assignments for the things that like you didn't test well into.[00:42:27] So that's, it's, it's sort of close to already being there in some sense, but it doesn't have the, the language model interface on top of it before we[00:42:34] swyx: get into our lightning round, which is like, Quick response questions. Was there any other topics that you think you wanted to cover? We didn't touch on, whisper.[00:42:40] We didn't touch on Apple. Anything you wanted to[00:42:42] Logan Kilpatrick: talk?[00:42:43] Apple's Neural Engine[00:42:43] Logan Kilpatrick: Yeah, I think the question around Apple stuff and, and the neural engine, I think will be really interesting to see how it all plays out. I think, I don't know if you wanna like ask just to give the context around the neural engine Apple question. Well, well, the[00:42:54] swyx: only thing I know it's because I've seen Apple keynotes.[00:42:57] Everyone has, you know, I, I have a m M one MacBook Cure. They have some kind of neuro chip. , but like, I don't see it in my day-to-day life, so when is this gonna affect me, essentially? And you worked at Apple, so I I was just gonna throw the question over to you, like, what should we[00:43:11] Logan Kilpatrick: expect out of this? Yeah.[00:43:12] The, the problem that I've seen so far with the neural engine and all the, the Mac, and it's also in the phones as well, is that the actual like, API to sort of talk to the neural engine isn't something that's like a common you like, I'm pretty sure it's either not exposed at all, like it only like Apple basically decides in the software layer Yeah.[00:43:34] When, when it should kick in and when it should be used, which I think doesn't really like help developers and it doesn't, that's why no one is using it. I saw a bunch of, and of course I don't have any good insight on this, but I saw a bunch of rumors that we're talking about, like a lot of. Main use cases for the neural engine stuff.[00:43:50] It's, it's basically just in like phantom mode. Now, I'm sure it's doing some processing, but like the main use cases will be a lot of the ar vr stuff that ends up coming out and like when it gets much heavier processing on like. Graphic stuff and doing all that computation, that's where it'll be. It'll be super important.[00:44:06] And they've basically been able to trial this for the last, like six years and have it part of everything and make sure that they can do it cheaply in a cost effective way. And so it'll be cool to see when that I'm, I hope it comes out. That'll be awesome.[00:44:17] swyx: Classic Apple, right? They, they're not gonna be first, but when they do it, they'll make a lot of noise about it.[00:44:21] Yeah. . It'll be[00:44:22] Logan Kilpatrick: awesome. Sure.[00:44:22] Lightning Round[00:44:22] Logan Kilpatrick: So, so are we going to light. Let's[00:44:24] Alessio Fanelli: do it. All right. Favorite AI products not[00:44:28] Logan Kilpatrick: open AI. Build . I think synthesis. Is synthesis.io is the, yeah, you can basically put in like a text prompt and they have like a human avatar that will like speak and you can basically make content in like educational videos.[00:44:44] And I think that's so cool because maybe as people who are making content, like it's, it's super hard to like record video. It just takes a long time. Like you have to edit all the stuff, make sure you sound right, and then when you edit yourself talking it's super weird cuz your mouth is there and things.[00:44:57] So having that and just being able to ChatGPT A script. Put it in. Hopefully I saw another demo of like somebody generating like slides automatically using some open AI stuff. Like I think that type of stuff. Chat, BCG, ,[00:45:10] swyx: a fantastic name, best name of all time .[00:45:14] Logan Kilpatrick: I think that'll be cool. So I'm super excited,[00:45:16] swyx: but Okay.[00:45:16] Well, so just a follow up question on, on that, because we're both in that sort of Devrel business, would you put AI Logan on your video, on your videos and a hundred[00:45:23] Logan Kilpatrick: percent, explain that . A hundred percent. I would, because again, if it reduces the time for me, like. I am already busy doing a bunch of other stuff,[00:45:31] And if I could, if I could take, like, I think the real use case is like I've made, and this is in the sense of like creators wanting to be on every platform. If I could take, you know, the blog posts that I wrote and then have AI break it up into a bunch of things, have ai Logan. Make a TikTok, make a YouTube video.[00:45:48] I cannot wait for that. That's gonna be so nice. And I think there's probably companies who are already thinking about doing that. I'm just[00:45:53] swyx: worried cuz like people have this uncanny valley reaction to like, oh, you didn't tell me what I just watched was a AI generated thing. I hate you. Now you know there, there's a little bit of ethics there and I'm at the disclaimer,[00:46:04] Logan Kilpatrick: at the top.[00:46:04] Navigating. Yeah. I also think people will, people will build brands where like their whole thing is like AI content. I really do think there are AI influencers out there. Like[00:46:12] swyx: there are entire Instagram, like million plus follower accounts who don't exist.[00:46:16] Logan Kilpatrick: I, I've seen that with the, the woman who's a Twitch streamer who like has some, like, she's using like some, I don't know, that technology from like movies where you're like wearing like a mask and it like changes your facial appearance and all that stuff.[00:46:27] So I think there's, there's people who find their niche plus it'll become more common. So, cool. My[00:46:32] swyx: question would be, favorite AI people in communities that you wanna shout up?[00:46:37] Logan Kilpatrick: I think there's a bunch of people in the ML ops community where like that seemed to have been like the most exciting. There was a lot of innovation, a lot of cool things happening in the ML op space, and then all the generative AI stuff happened and then all the ML Ops two people got overlooked.[00:46:51] They're like, what's going on here? So hopefully I still think that ML ops and things like that are gonna be super important for like getting machine learning to be where it needs to be for us to. AGI and all that stuff. So a year from[00:47:05] Alessio Fanelli: now, what will people be the most[00:47:06] Logan Kilpatrick: surprised by? N. I think the AI is gonna get very, very personalized very quickly, and I don't think that people have that feeling yet with chat, BT, but I, I think that that's gonna, that's gonna happen and they'll be surprised in like the, the amount of surface areas in which AI is present.[00:47:23] Like right now it's like, it's really exciting cuz Chat BT is like the one place that you can sort of get that cool experience. But I think that, The people at Facebook aren't dumb. The people at Google aren't dumb. Like they're gonna have, they're gonna have those experiences in a lot of different places and I think that'll be super fascinating to see.[00:47:40] swyx: This is for the builders out there. What's an AI thing you would pay for if someone built it with their personal[00:47:45] Logan Kilpatrick: work? I think more stuff around like transfer learning for, like making transfer, learning easier. Like I think that's truly the way to. Build really cool things is transfer learning, fine tuning, and I, I don't think that there's enough.[00:48:04] Jeremy Howard who created Fasted AI talks a lot about this. I mean, it's something that really resonates with me and, and for context, like at Apple, all the machine learning stuff that we did was transfer learning because it was so powerful. And I think people have this perception that they need to.[00:48:18] Build things from scratch and that's not the case. And I think especially as large language models become more accessible, people need to build layers and products on top of this to make transfer learning more accessible to more people. So hopefully somebody builds something like that and we can all train our own models.[00:48:33] I think that's how you get like that personalized AI experiences you put in your stuff. Make transfer learning easy. Everyone wins. Just just to vector in[00:48:40] swyx: a little bit on this. So in the stable diffusion community, there's a lot of practice of like, I'll fine tune a custom dis of stable diffusion and share it.[00:48:48] And then there also, there's also this concept of, well, first it was textual inversion and then dream booth where you essentially train a concept that you can sort of add on. Is that what you're thinking about when you talk about transfer learning or is that something[00:48:59] Logan Kilpatrick: completely. I feel like I'm not as in tune with the generative like image model community as I probably should be.[00:49:07] I, I think that that makes a lot of sense. I think there'll be like whole ecosystems and marketplaces that are sort of built around exactly what you just said, where you can sort of fine tune some of these models in like very specific ways and you can use other people's fine tunes. That'll be interesting to see.[00:49:21] But, c.ai is,[00:49:23] swyx: what's it called? C C I V I Ts. Yeah. It's where people share their stable diffusion checkpoints in concepts and yeah, it's[00:49:30] Logan Kilpatrick: pretty nice. Do you buy them or is it just like free? Like open. Open source? It's, yeah. Cool. Even better.[00:49:34] swyx: I think people might want to sell them. There's a, there's a prompt marketplace.[00:49:38] Prompt base, yeah. Yeah. People hate it. Yeah. They're like, this should be free. It's just text. Come on, .[00:49:45] Alessio Fanelli: Hey, it's knowledge. All right. Last question. If there's one thing you want everyone to take away about ai, what would.[00:49:51] Logan Kilpatrick: I think the AI revolution is gonna, you know, it's been this like story that people have been talking about for the longest time, and I don't think that it's happened.[00:50:01] It was really like, oh, AI's gonna take your job, AI's gonna take your job, et cetera, et cetera. And I think people have sort of like laughed that off for a really long time, which was fair because it wasn't happening. And I think now, Things are going to accelerate very, very quickly. And if you don't have your eyes wide open about what's happening, like there's a good chance that something that you might get left behind.[00:50:21] So I'm, I'm really thinking deeply these days about like how that is going to impact a lot of people. And I, I'm hopeful that the more widespread this technology becomes, the more mainstream this technology becomes, the more people will benefit from it and hopefully not be affected in that, in that negative way.[00:50:35] So use these tools, put them into your workflow, and, and hopefully that will, and that will acceler. Well,[00:50:41] swyx: we're super happy that you're at OpenAI getting this message out there, and I'm sure we'll see a lot more from you in the coming months[00:50:46] Logan Kilpatrick: and years. I'm excited that this was awesome to be on. This is actually the first, my first in-person podcast.[00:50:52] I've done so many Yeah. Virtual podcasts over the, the covid years and it's, it's super fun to be in person and where the headphones in . Yeah.[00:51:00] swyx: We gotta shout out this studio. I mean, let's, let's get them a shout out Pod on[00:51:03] Alessio Fanelli: in San Francisco, California. Where should people find you? Social media.[00:51:08] Logan Kilpatrick: Twitter. It'll be interesting to see how that, the migration or not migration.[00:51:12] I was, I was pretty sold. I'm like everyone was getting off Twitter and then that seemed like that. It sort of was a network. Network effects are hard too. Yeah, it is hard. So Twitter, I'll see you on Twitter. Thanks so much coming. Thanks. Thanks for having me. This was awesome. Thank you, Logan. Get full access to Latent.Space at www.latent.space/subscribe