-
The Future of Notebooks - with Akshay Agrawal of Marimo
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-07-18 13:00
Akshay Agrawal joins us to talk about Marimo and their vision for the future of Python notebooks, and how it’s the perfect canvas for AI-driven data analysis. 0:00 Introduction 0:46 Overview of Marimo and Its Features 2:33 Origin Story and Motivation Behind Marimo 4:26 Demo: Classical Machine Learning with MNIST in Marimo 6:52 Notebook Compatibility and Conversion from Jupyter 7:42 Demo: Interactive Notebook with Custom UI and Layout 10:08 AI-Native Utilities and Code Generation with Language Models 11:36 Dependency Management and Integration with UV Package Manager 13:00 Demo: Data Annotation Workflow Using a PS5 Controller 15:51 Starting from Scratch: Blank Canvas AI Use Cases 18:27 Context Formatting for AI Code Generation 19:54 Chat Interface and Local/Remote Model Support 21:01 WebAssembly Support and MoLab Cloud-Hosted Notebooks 23:21 Future Plans and Breaking Out of Old Notebook Habits 25:40 Running Marimo Notebooks as Scripts or Data Apps 26:44 Exploring AI Agents and Community Contributions 26:56 Call to Action: How to Get Started and Contribute
-
Cline: the open source coding agent that doesn't cut costs
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-07-16 18:08
Saoud Rizwan and Pash from Cline joined us to talk about why fast apply models got bitter lesson'd, how they pioneered the plan + act paradigm for coding, and why non-technical people use IDEs to do marketing and generate slides. Full writeup: https://www.latent.space/p/cline X: https://x.com/latentspacepod Chapters: 00:00 - Introductions 01:35 - Plan and Act Paradigm 05:37 - Model Evaluation and Early Development of Cline 08:14 - Use Cases of Cline Beyond Coding 09:09 - Why Cline is a VS Code Extension and Not a Fork 12:07 - Economic Value of Programming Agents 16:07 - Early Adoption for MCPs 19:35 - Local vs Remote MCP Servers 22:10 - Anthropic's Role in MCP Registry 22:49 - Most Popular MCPs and Their Use Cases 25:26 - Challenges and Future of MCP Monetization 27:32 - Security and Trust Issues with MCPs 28:56 - Alternative History Without MCP 29:43 - Market Positioning of Coding Agents and IDE Integration Matrix 32:57 - Visibility and Autonomy in Coding Agents 35:21 - Evolving Definition of Complexity in Programming Tasks 38:16 - Forks of Cline and Open Source Regrets 40:07 - Simplicity vs Complexity in Agent Design 46:33 - How Fast Apply Got Bitter Lesson'd 49:12 - Cline's Business Model and Bring-Your-Own-API-Key Approach 54:18 - Integration with OpenRouter and Enterprise Infrastructure 55:32 - Impact of Declining Model Costs 57:48 - Background Agents and Multi-Agent Systems 1:00:42 - Vision and Multi-Modalities 1:01:07 - State of Context Engineering 1:07:37 - Memory Systems in Coding Agents 1:10:14 - Standardizing Rules Files Across Agent Tools 1:11:16 - Cline's Personality and Anthropomorphization 1:12:55 - Hiring at Cline and Team Culture Chapters 00:00:00 Introduction and Guest Intros 00:00:29 What is Klein? Product Overview 00:01:42 Plan and Act Paradigm 00:05:22 Model Evolution and Building Klein 00:07:40 Beyond Coding: Klein as a General Agent 00:09:12 Why Focus on VS Code Extension? 00:11:26 The Future of Programming and Agentic Paradigm 00:12:34 Economic Value: Programming vs. Other Use Cases 00:16:04 MCP Ecosystem: Growth and Marketplace 00:21:30 Security, Discoverability, and Trust in MCPs 00:22:55 Popular MCPs and Workflow Automation 00:25:30 Monetization and Payments for MCPs 00:37:53 Competition, Forks, and Open Source Philosophy 00:40:39 RAG, Fast Apply, and Agentic Simplicity 00:50:11 Business Model and Enterprise Adoption 00:57:04 Background Agents, Multi-Agent Systems, and CLI 01:00:41 Context Engineering and Memory 01:12:39 Team, Culture, and Closing Thoughts
-
Personalized AI Language Education — with Andrew Hsu, Speak
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-07-11 19:06
Speak (https://speak.com) may not be very well known to native English speakers, but they have come from a slow start in 2016 to emerge as one of the favorite partners of OpenAI, with their Startup Fund leading and joining their Series B and C as one of the new AI-native unicorns, noting that “Speak has the potential to revolutionize not just language learning, but education broadly”. Today we speak with Speak’s CTO, Andrew Hsu, on the journey of building the “3rd generation” of language learning software (with Rosetta Stone being Gen 1, and Duolingo being Gen 2). Speak’s premise is that speech and language models can now do what was previously only possible with human tutors—provide fluent, responsive, and adaptive instruction—and this belief has shaped its product and company strategy since its early days. https://www.linkedin.com/in/adhsu/ https://speak.com One of the most interesting strategic decisions discussed in the episode is Speak’s early focus on South Korea. While counterintuitive for a San Francisco-based startup, the decision was influenced by a combination of market opportunity and founder proximity via a Korean first employee. South Korea’s intense demand for English fluency and a highly competitive education market made it a proving ground for a deeply AI-native product. By succeeding in a market saturated with human-based education solutions, Speak validated its model and built strong product-market fit before expanding to other Asian markets and eventually, globally. The arrival of Whisper and GPT-based LLMs in 2022 marked a turning point for Speak. Suddenly, capabilities that were once theoretical—real-time feedback, semantic understanding, conversational memory—became technically feasible. Speak didn’t pivot, but rather evolved into its second phase: from a supplemental practice tool to a full-featured language tutor. This transition required significant engineering work, including building custom ASR models, managing latency, and integrating real-time APIs for interactive lessons. It also unlocked the possibility of developing voice-first, immersive roleplay experiences and a roadmap to real-time conversational fluency. To scale globally and support many languages, Speak is investing heavily in AI-generated curriculum and content. Instead of manually scripting all lessons, they are building agents and pipelines that can scaffold curriculum, generate lesson content, and adapt pedagogically to the learner. This ties into one of Speak’s most ambitious goals: creating a knowledge graph that captures what a learner knows and can do in a target language, and then adapting the course path accordingly. This level-adjusting tutor model aims to personalize learning at scale and could eventually be applied beyond language learning to any educational domain. Finally, the conversation touches on the broader implications of AI-powered education and the slow real-world adoption of transformative AI technologies. Despite the capabilities of GPT-4 and others, most people’s daily lives haven’t changed dramatically. Speak sees itself as part of the generation of startups that will translate AI’s raw power into tangible consumer value. The company is also a testament to long-term conviction—founded in 2016, it weathered years of slow growth before AI caught up to its vision. Now, with over $50M ARR, a growing B2B arm, and plans to expand across languages and learning domains, Speak represents what AI-native education could look like in the next decade. Chapters 00:00:00 Introductions & Thiel Fellowship Origins 00:02:13 Genesis of Speak: Early Vision & Market Focus 00:03:44 Building the Product: Iterations and Lessons Learned 00:10:59 AI’s Role in Language Learning 00:13:49 Scaling Globally & B2B Expansion 00:16:30 Why Korea? Localizing for Success 00:19:08 Content Creation, The Speak Method, and Engineering Culture 00:23:31 The Impact of Whisper and LLM Advances 00:29:08 AI-Generated Content & Measuring Fluency 00:35:30 Personalization, Dialects, and Pronunciation 00:39:38 Immersive Learning, Multimodality, and Real-Time Voice 00:50:02 Engineering Challenges & Company Culture 00:53:20 Beyond Languages: B2B, Knowledge Graphs, and Broader Learning 00:57:32 Fun Stories, Lessons, and Reflections 01:02:03 Final Thoughts: The Future of AI Learning & Slow Takeoff
-
AI Video Is Eating The World — Olivia and Justine Moore, a16z
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-07-09 19:26
When the first video diffusion models started emerging, they were little more than just “moving pictures” - still frames extended a few seconds in either direction in time. There was a ton of excitement about OpenAI’s Sora on release through 2024, but so far only Sora-lite has been widely released. Meanwhile, other good videogen models like Genmo Mochi, Pika, MiniMax T2V, Tencent Hunyuan Video, and Kuaishou’s Kling have emerged, but the reigning king this year seems to be Google’s Veo 3, which for the first time has added native audio generation into their model capabilities, eliminating the need for a whole class of lipsynching tooling and SFX editing. The rise of Veo 3 unlocks a whole new category of AI Video creators that many of our audience may not have been exposed to, but is undeniably effective and important particularly in the “kids” and “brainrot” segments of the global consumer internet platforms like Tiktok, YouTube and Instagram. By far the best documentarians of these trends for laypeople are Olivia and Justine Moore, both partners at a16z, who not only collate the best examples from all over the web, but dabble in video creation themselves to put theory into practice. We’ve been thinking of dabbling in AI brainrot on a secondary channel for Latent Space, so we wanted to get the braindump from the Moore twins on how to make a Latent Space Brainrot channel. Jump on in! Chapters 00:00:00 Introductions & Guest Welcome 00:00:49 The Rise of Generative Media 00:02:24 AI Video Trends: Italian Brain Rot & Viral Characters 00:05:00 Following Trends & Creating AI Content 00:07:17 Hands-On with AI Video Creation 00:18:36 Monetization & Business of AI Content 00:23:34 Platforms, Models, and the Creator Stack 00:37:22 Native Content vs. Clipping & Going Viral 00:41:52 Prompt Theory & Meta-Trends in AI Creativity 00:47:42 Professional, Commercial, and Platform-Specific AI Video 00:48:57 Wrap-Up & Final Thoughts
-
Information Theory for Language Models: Jack Morris
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-07-02 16:06
Our last AI PhD grad student feature was Shunyu Yao, who happened to focus on Language Agents for his thesis and immediately went to work on them for OpenAI. Our pick this year is Jack Morris, who bucks the “hot” trends by -not- working on agents, benchmarks, or VS Code forks, but is rather known for his work on the information theoretic understanding of LLMs, starting from embedding models and latent space representations (always close to our heart). Jack is an unusual combination of doing underrated research but somehow still being to explain them well to a mass audience, so we felt this was a good opportunity to do a different kind of episode going through the greatest hits of a high profile AI PhD, and relate them to questions from AI Engineering. Papers and References made AI grad school: https://x.com/jxmnop/status/1933884519557353716A new type of information theory: https://x.com/jxmnop/status/1904238408899101014EmbeddingsText Embeddings Reveal (Almost) As Much As Text: https://arxiv.org/abs/2310.06816Contextual document embeddings https://arxiv.org/abs/2410.02525Harnessing the Universal Geometry of Embeddings: https://arxiv.org/abs/2505.12540Language modelsGPT-style language models memorize 3.6 bits per param: https://x.com/jxmnop/status/1929903028372459909Approximating Language Model Training Data from Weights: https://arxiv.org/abs/2506.15553https://x.com/jxmnop/status/1936044666371146076LLM Inversion"There Are No New Ideas In AI.... Only New Datasets"https://x.com/jxmnop/status/1910087098570338756https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-onlymisc reference: https://junyanz.github.io/CycleGAN/ — for others hiring AI PhDs, Jack also wanted to shout out his coauthor Zach Nussbaum, his coauthor on Nomic Embed: Training a Reproducible Long Context Text Embedder.
-
Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-06-19 18:59
Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall Timestamps 00:00 Intro – Diplomacy, Cicero & World Championship 02:00 Reverse Centaur: How AI Improved Noam’s Human Play 05:00 Turing Test Failures in Chat: Hallucinations & Steerability 07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm 11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe) 14:00 The Deep Research Existence Proof for Unverifiable Domains 17:30 Harnesses, Tool Use, and Fragility in AI Agents 21:00 The Case Against Over-Reliance on Scaffolds and Routers 24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability 28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough 34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments 38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews 41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis 44:30 Implicit World Models and Theory of Mind Through Scaling 48:00 Why Self-Play Breaks Down Beyond Go and Chess 54:00 Designing Better Benchmarks for Fuzzy Tasks 57:30 The Real Limits of Test-Time Compute: Cost vs. Time 1:00:30 Data Efficiency Gaps Between Humans and LLMs 1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining 1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego 1:10:00 Closing Thoughts – Five-Year View and Open Research Directions Chapters 00:00:00 Intro & Guest Welcome 00:00:33 Diplomacy AI & Cicero Insights 00:03:49 AI Safety, Language Models, and Steerability 00:05:23 O Series Models: Progress and Benchmarks 00:08:53 Reasoning Paradigm: Thinking Fast and Slow in AI 00:14:02 Design Questions: Harnesses, Tools, and Test Time Compute 00:20:32 Reinforcement Fine-tuning & Model Specialization 00:21:52 The Rise of Reasoning Models at OpenAI 00:29:33 Data Efficiency in Machine Learning 00:33:21 Coding & AI: Codex, Workflows, and Developer Experience 00:41:38 Multi-Agent AI: Collaboration, Competition, and Civilization 00:45:14 Poker, Diplomacy & Exploitative vs. Optimal AI Strategy 00:52:11 World Models, Multi-Agent Learning, and Self-Play 00:58:50 Generative Media: Image & Video Models 01:00:44 Robotics: Humanoids, Iteration Speed, and Embodiment 01:04:25 Rapid Fire: Research Practices, Benchmarks, and AI Progress 01:14:19 Games, Imperfect Information, and AI Research Directions
-
The Shape of Compute (Chris Lattner of Modular)
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-06-13 16:40
Chris Lattner of Modular (https://modular.com) joined us (again!) to talk about how they are breaking the CUDA monopoly, what it took to match NVIDIA performance with AMD, and how they are building a company of "elite nerds". X: https://x.com/latentspacepod Substack: https://latent.space 00:00:00 Introductions 00:00:12 Overview of Modular and the Shape of Compute 00:02:27 Modular’s R&D Phase 00:06:55 From CPU Optimization to GPU Support 00:11:14 MAX: Modular’s Inference Framework 00:12:52 Mojo Programming Language 00:18:25 MAX Architecture: From Mojo to Cluster-Scale Inference 00:29:16 Open Source Contributions and Community Involvement 00:32:25 Modular's Differentiation from VLLM and SGLang 00:41:37 Modular’s Business Model and Monetization Strategy 00:53:17 DeepSeek’s Impact and Low-Level GPU Programming 01:00:00 Inference Time Compute and Reasoning Models 01:02:31 Personal Reflections on Leading Modular 01:08:27 Daily Routine and Time Management as a Founder 01:13:24 Using AI Coding Tools and Staying Current with Research 01:14:47 Personal Projects and Work-Life Balance 01:17:05 Hiring, Open Source, and Community Engagement
-
The Utility of Interpretability — Emmanuel Amiesen
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-06-06 17:00
Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ). We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was released last week: https://www.anthropic.com/research/open-source-circuit-tracing This is a 2 part episode - an intro covering the open source release, then a deeper dive into the paper — with guest host Vibhu Sapra (https://x.com/vibhuuuus ) and Mochi the MechInterp Pomsky (https://x.com/mochipomsky ). Thanks to Vibhu for making this episode happen! While the original blogpost contained some fantastic guided visualizations (which we discuss at the end of this pod!), with the notebook and Neuronpedia visualization (https://www.neuronpedia.org/gemma-2-2b/graph ) released this week, you can now explore on your own with Neuronpedia, as we show you in the video version of this pod. Chapters 00:00 Intro & Guest Introductions 01:00 Anthropic's Circuit Tracing Release 06:11 Exploring Circuit Tracing Tools & Demos 13:01 Model Behaviors and User Experiments 17:02 Behind the Research: Team and Community 24:19 Main Episode Start: Mech Interp Backgrounds 25:56 Getting Into Mech Interp Research 31:52 History and Foundations of Mech Interp 37:05 Core Concepts: Superposition & Features 39:54 Applications & Interventions in Models 45:59 Challenges & Open Questions in Interpretability 57:15 Understanding Model Mechanisms: Circuits & Reasoning 01:04:24 Model Planning, Reasoning, and Attribution Graphs 01:30:52 Faithfulness, Deception, and Parallel Circuits 01:40:16 Publishing Risks, Open Research, and Visualization 01:49:33 Barriers, Vision, and Call to Action
-
[AIEWF Preview] Containing Agent Chaos — Solomon Hykes
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-06-03 13:30
Solomon most famously created Docker and now runs Dagger… which has something special to share with you on Thursday. Catch Dagger at: - Tuesday: Dagger’s workshop https://www.ai.engineer/schedule#ship-agents-that-ship-a-hands-on-workshop-for-swe-agent-builders - Wednesday: Dagger’s talk: https://www.ai.engineer/schedule#how-to-trust-an-agent-with-software-delivery - Thursday: Solomon’s Keynote https://www.ai.engineer/schedule#containing-agent-chaos Chapters 00:00 Introduction & Guest Background 00:29 What is Dagger? Post-Development Automation 01:08 Dagger’s Community & Platform Engineers 02:32 AI Agents and Developer Workflows 03:40 Environment Isolation & The Power of Containers 06:28 The Need for Standards in Agent Environments 07:25 Design Constraints & Challenges for Dev Environments 11:26 Limitations of Current Tools & Agent-Native UX 14:11 Modularity, Customization, and the Lego Analogy 16:24 Convergence of CICD and Agentic Systems 17:41 Ephemeral Apps, Resource Constraints, and Local Execution 21:01 Adoption, Ecosystem, and the Role of Open Source 23:30 Dagger’s Modular Approach & Integration Philosophy 25:38 Looking Ahead: Workshops, Keynotes, and the Future of Agentic Infrastructure
-
[AIEWF Preview] CloudChef: Your Robot Chef - Michellin-Star food at $12/hr (w/ Kitchen tour!)
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-05-31 01:06
One of the new tracks at next week’s AI Engineer conference in SF is a new focus on LLMs + Robotics, ft. household names like Waymo and Physical Intelligence. However there are many other companies applying LLMs and VLMs in the real world! CloudChef, the first industrial-scale kitchen robotics company with one-shot demonstration learning and an incredibly simple business model, will be serving tasty treats all day with Zippy (https://www.cloudchef.co/zippy ) their AI Chef platform. This is a lightning pod with CEO Nikhil Abraham to preview what Zippy is capable of! https://www.cloudchef.co/platform See a real chef comparison: https://www.youtube.com/watch?v=INDhZ7LwSeo&t=64s See it in the AI Engineer Expo at SF next week: https://ai.engineer Chapters 00:00 Welcome and Introductions 00:58 What is Cloud Chef? 01:36 How the Robots Work: Culinary Intelligence 05:57 Commercial Applications and Early Success 07:02 The Software-First Approach 10:09 Business Model and Pricing 13:10 Demonstration Learning: Training the Robots 16:03 Call to Action and Engineering Opportunities 18:45 Final Thoughts and Technical Details
-
The AI Coding Factory
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-05-29 17:37
We are joined by Eno Reyes and Matan Grinberg, the co-founders of Factory.ai. They are building droids for autonomous software engineering, handling everything from code generation to incident response for production outages. After raising a $15M Series A from Sequoia, they just released their product in GA! https://factory.ai/ https://x.com/latentspacepod Chapters 00:00:00 Introductions 00:00:35 Meeting at Langchain Hackathon 00:04:02 Building Factory despite early model limitations 00:06:56 What is Factory AI? 00:08:55 Delegation vs Collaboration in AI Development Tools 00:10:06 Naming Origins of 'Factory' and 'Droids' 00:12:17 Defining Droids: Agent vs Workflow 00:14:34 Live Demo 00:17:37 Enterprise Context and Tool Integration in Droids 00:20:26 Prompting, Clarification, and Agent Communication 00:22:28 Project Understanding and Proactive Context Gathering 00:24:10 Why SWE-Bench Is Dead 00:28:47 Model Fine-tuning and Generalization Challenges 00:31:07 Why Factory is Browser-Based, Not IDE-Based 00:33:51 Test-Driven Development and Agent Verification 00:36:17 Retrieval vs Large Context Windows for Cost Efficiency 00:38:02 Enterprise Metrics: Code Churn and ROI 00:40:48 Executing Large Refactors and Migrations with Droids 00:45:25 Model Speed, Parallelism, and Delegation Bottlenecks 00:50:11 Observability Challenges and Semantic Telemetry 00:53:44 Hiring 00:55:19 Factory's design and branding approach 00:58:34 Closing Thoughts and Future of AI-Native Development
-
[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-05-23 05:01
In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer). Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio. Chapters 00:00 Introduction and Episode Overview 02:01 Discussion on Cloud 4 and its Features 04:31 Reasoning and Tool Use in AI Models 07:01 Extended Thinking in Claude and Model Differences 09:31 Speculation on Claude's Extended Thinking 11:01 Challenges and Controversies in AI Model Training 13:31 Technical Highlights and Code Trustworthiness 16:01 Token Costs and Incentives in AI Models 18:31 Thinking Budgets and AI Effort 21:01 Safety and Ethics in AI Model Development 23:31 Anthropic's Approach to AI Safety 26:01 LLM Arena and Evaluation Challenges 28:31 Developing Taste and Direction in AI Research 31:01 Recent Research and Multi-Turn RL 33:31 Tools and Incentives in AI Model Development 36:01 Challenges in Evaluating AI Model Outputs 38:31 Model-Based Rewards and Future Directions 41:01 Wrap-up and Future Plans
-
ChatGPT Codex: The Missing Manual
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-05-16 23:35
ChatGPT Codex is here - the first cloud hosted Autonomous Software Engineer (A-SWE) from OpenAI. We sat down for a quick pod with two core devs on the ChatGPT Codex team: Josh Ma and Alexander Embiricos to get the inside scoop on the origin story of Codex, from WHAM to its future roadmap. Follow them: https://github.com/joshma and https://x.com/embirico Chapters - 00:00 Introduction to the Latent Space Podcast - 00:59 The Launch of ChatGPT Codex - 03:08 Personal Journeys into AI Development - 05:50 The Evolution of Codex and AI Agents - 08:55 Understanding the Form Factor of Codex - 11:48 Building a Software Engineering Agent - 14:53 Best Practices for Using AI Agents - 17:55 The Importance of Code Structure for AI - 21:10 Navigating Human and AI Collaboration - 23:58 Future of AI in Software Development - 28:18 Planning and Decision-Making in AI Development - 31:37 User, Developer, and Model Dynamics - 35:28 Building for the Future: Long-Term Vision - 39:31 Best Practices for Using AI Tools - 42:32 Understanding the Compute Platform - 48:01 Iterative Deployment and Future Improvements
-
Claude Code: Anthropic's CLI Agent
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-05-07 21:59
More info: https://docs.anthropic.com/en/docs/claude-code/overview The AI coding wars have now split across four battlegrounds: 1. AI IDEs: with two leading startups in Windsurf ($3B acq. by OpenAI) and Cursor ($9B valuation) and a sea of competition behind them (like Cline, Github Copilot, etc). 2. Vibe coding platforms: Bolt.new, Lovable, v0, etc. all experiencing fast growth and getting to the tens of millions of revenue in months. 3. The teammate agents: Devin, Cosine, etc. Simply give them a task, and they will get back to you with a full PR (with mixed results) 4. The cli-based agents: after Aider’s initial success, we are now seeing many other alternatives including two from the main labs: OpenAI Codex and Claude Code. The main draw is that 1) they are composable 2) they are pay as you go based on tokens used. Since we covered all three of the first categories, today’s guests are Boris and Cat, the lead engineer and PM for Claude Code. If you only take one thing away from this episode, it’s this piece from Boris: Claude Code is not a product as much as it’s a Unix utility. This fits very well with Anthropic’s product principle: “do the simple thing first.” Whether it’s the memory implementation (a markdown file that gets auto-loaded) or the approach to prompt summarization (just ask Claude to summarize), they always pick the smallest building blocks that are useful, understandable, and extensible. Even major features like planning (“/think”) and memory (#tags in markdown) fit the same idea of having text I/O as the core interface. This is very similar to the original UNIX design philosophy: Claude Code is also the most direct way to consume Sonnet for coding, rather than going through all the hidden prompting and optimization than the other products do. You will feel that right away, as the average spend per user is $6/day on Claude Code compared to $20/mo for Cursor, for example. Apparently, there are some engineers inside of Anthropic that have spent >$1,000 in one day! If you’re building AI developer tools, there’s also a lot of alpha on how to design a cli tool, interactive vs non-interactive modes, and how to balance feature creation. Enjoy! Timestamps [00:00:00] Intro [00:01:59] Origins of Claude Code [00:04:32] Anthropic’s Product Philosophy [00:07:38] What should go into Claude Code? [00:09:26] Claude.md and Memory Simplification [00:10:07] Claude Code vs Aider [00:11:23] Parallel Workflows and Unix Utility Philosophy [00:12:51] Cost considerations and pricing model [00:14:51] Key Features Shipped Since Launch [00:16:28] Claude Code writes 80% of Claude Code [00:18:01] Custom Slash Commands and MCP Integration [00:21:08] Terminal UX and Technical Stack [00:27:11] Code Review and Semantic Linting [00:28:33] Non-Interactive Mode and Automation [00:36:09] Engineering Productivity Metrics [00:37:47] Balancing Feature Creation and Maintenance [00:41:59] Memory and the Future of Context [00:50:10] Sandboxing, Branching, and Agent Planning [01:01:43] Future roadmap [01:11:00] Why Anthropic Excels at Developer Tools
-
⚡️The Rise and Fall of the Vector DB Category
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-05-01 16:34
Note from your hosts: we were off this week for ICLR and RSA! This week we’re bringing you one of the top episodes from our lightning podcast series, the shorter format, Youtube-only side podcast we do for breaking news and faster turnaround. Please support our work on YouTube! https://www.youtube.com/playlist?list=PLWEAb1SXhjlc5qgVK4NgehdCzMYCwZtiB The explosion of embedding-based applications created a new challenge: efficiently storing, indexing, and searching these high-dimensional vectors at scale. This gap gave rise to the vector database category, with companies like Pinecone leading the charge in 2022-2023 by defining specialized infrastructure for vector operations. The category saw explosive growth following ChatGPT's launch in late 2022, as developers rushed to build AI applications using Retrieval-Augmented Generation (RAG). This surge was partly driven by a widespread misconception that embedding-based similarity search was the only viable method for retrieving context for LLMs!!! The resulting "vector database gold rush" saw massive investment and attention directed toward vector search infrastructure, even though traditional information retrieval techniques remained equally valuable for many RAG applications. https://x.com/jobergum/status/1872923872007217309 Chapters 00:00 Introduction to Trondheim and Background 03:03 The Rise and Fall of Vector Databases 06:08 Convergence of Search Technologies 09:04 Embeddings and Their Importance 12:03 Building Effective Search Systems 15:00 RAG Applications and Recommendations 17:55 The Role of Knowledge Graphs 20:49 Future of Embedding Models and Innovations
-
Why Every Agent needs Open Source Cloud Sandboxes
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-04-24 01:57
Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality. The most common use cases for their sandboxes: - Run data analysis and charting (like Perplexity) - Execute arbitrary code generated by the model (like Manus does) - Running evals on code generation (see LMArena Web) - Doing reinforcement learning for code capabilities (like HuggingFace) Timestamps: 00:00:00 Introductions 00:00:37 Origin of DevBook -> E2B 00:02:35 Early Experiments with GPT-3.5 and Building AI Agents 00:05:19 Building an Agent Cloud 00:07:27 Challenges of Building with Early LLMs 00:10:35 E2B Use Cases 00:13:52 E2B Growth vs Models Capabilities 00:15:03 The LLM Operating System (LLMOS) Landscape 00:20:12 Breakdown of JavaScript vs Python Usage on E2B 00:21:50 AI VMs vs Traditional Cloud 00:26:28 Technical Specifications of E2B Sandboxes 00:29:43 Usage-based billing infrastructure 00:34:08 Pricing AI on Value Delivered vs Token Usage 00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes 00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks 00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents 00:44:00 MCPs and Remote Agent Capabilities 00:49:22 LLMs.txt, scrapers, and bad AI bots 00:53:00 Manus and Computer Use on E2B 00:55:03 E2B for RL with Hugging Face 00:56:58 E2B for Agent Evaluation on LMArena 00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs 01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps 01:01:15 Why E2B Moved to San Francisco 01:05:49 Open Roles and Hiring Plans at E2B
-
⚡️GPT 4.1: The New OpenAI Workhorse
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-04-15 04:30
We’ll keep this brief because we’re on a tight turnaround: GPT 4.1, previously known as the Quasar and Optimus models, is now live as the natural update for 4o/4o-mini (and the research preview of GPT 4.5). Though it is a general purpose model family, the headline features are: Coding abilities (o1-level SWEBench and SWELancer, but ok Aider) Instruction Following (with a very notable prompting guide) Long Context up to 1m tokens (with new MRCR and Graphwalk benchmarks) Vision (simply o1 level) Cheaper Pricing (cheaper than 4o, greatly improved prompt caching savings) We caught up with returning guest Michelle Pokrass and Josh McGrath to get more detail on each! Chapters 00:00:00 Introduction and Guest Welcome 00:00:57 GPC 4.1 Launch Overview 00:01:54 Developer Feedback and Model Names 00:02:53 Model Naming and Starry Themes 00:03:49 Confusion Over GPC 4.1 vs 4.5 00:04:47 Distillation and Model Improvements 00:05:45 Omnimodel Architecture and Future Plans 00:06:43 Core Capabilities of GPC 4.1 00:07:40 Training Techniques and Long Context 00:08:37 Challenges in Long Context Reasoning 00:09:34 Context Utilization in Models 00:10:31 Graph Walks and Model Evaluation 00:11:31 Real Life Applications of Graph Tasks 00:12:30 Multi-Hop Reasoning Benchmarks 00:13:30 Agentic Workflows and Backtracking 00:14:28 Graph Traversals for Agent Planning 00:15:24 Context Usage in API and Memory Systems 00:16:21 Model Performance in Long Context Tasks 00:17:17 Instruction Following and Real World Data 00:18:12 Challenges in Grading Instructions 00:19:09 Instruction Following Techniques 00:20:09 Prompting Techniques and Model Responses 00:21:05 Agentic Workflows and Model Persistence 00:22:01 Balancing Persistence and User Control 00:22:56 Evaluations on Model Edits and Persistence 00:23:55 XML vs JSON in Prompting 00:24:50 Instruction Placement in Context 00:25:49 Optimizing for Prompt Caching 00:26:49 Chain of Thought and Reasoning Models 00:27:46 Choosing the Right Model for Your Task 00:28:46 Coding Capabilities of GPC 4.1 00:29:41 Model Performance in Coding Tasks 00:30:39 Understanding Coding Model Differences 00:31:36 Using Smaller Models for Coding 00:32:33 Future of Coding in OpenAI 00:33:28 Internal Use and Success Stories 00:34:26 Vision and Multi-Modal Capabilities 00:35:25 Screen vs Embodied Vision 00:36:22 Vision Benchmarks and Model Improvements 00:37:19 Model Deprecation and GPU Usage 00:38:13 Fine-Tuning and Preference Steering 00:39:12 Upcoming Reasoning Models 00:40:10 Creative Writing and Model Humor 00:41:07 Feedback and Developer Community 00:42:03 Pricing and Blended Model Costs 00:44:02 Conclusion and Wrap-Up
-
SF Compute: Commoditizing Compute
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-04-11 18:53
Evan Conrad, co-founder of SF Compute, joined us to talk about how they started as an AI lab that avoided bankruptcy by selling GPU clusters, why CoreWeave financials look like a real estate business, and how GPUs are turning into a commodities market. Chapters: 00:00:05 - Introductions 00:00:12 - Introduction of guest Evan Conrad from SF Compute 00:00:12 - CoreWeave Business Model Discussion 00:05:37 - CoreWeave as a Real Estate Business 00:08:59 - Interest Rate Risk and GPU Market Strategy Framework 00:16:33 - Why Together and DigitalOcean will lose money on their clusters 00:20:37 - SF Compute's AI Lab Origins 00:25:49 - Utilization Rates and Benefits of SF Compute Market Model 00:30:00 - H100 GPU Glut, Supply Chain Issues, and Future Demand Forecast 00:34:00 - P2P GPU networks 00:36:50 - Customer stories 00:38:23 - VC-Provided GPU Clusters and Credit Risk Arbitrage 00:41:58 - Market Pricing Dynamics and Preemptible GPU Pricing Model 00:48:00 - Future Plans for Financialization? 00:52:59 - Cluster auditing and quality control 00:58:00 - Futures Contracts for GPUs 01:01:20 - Branding and Aesthetic Choices Behind SF Compute 01:06:30 - Lessons from Previous Startups 01:09:07 - Hiring at SF Compute Chapters 00:00:00 Introduction and Background 00:00:58 Analysis of GPU Business Models 00:01:53 Challenges with GPU Pricing 00:02:48 Revenue and Scaling with GPUs 00:03:46 Customer Sensitivity to GPU Pricing 00:04:44 Core Weave's Business Strategy 00:05:41 Core Weave's Market Perception 00:06:40 Hyperscalers and GPU Market Dynamics 00:07:37 Financial Strategies for GPU Sales 00:08:35 Interest Rates and GPU Market Risks 00:09:30 Optimal GPU Contract Strategies 00:10:27 Risks in GPU Market Contracts 00:11:25 Price Sensitivity and Market Competition 00:12:21 Market Dynamics and GPU Contracts 00:13:18 Hyperscalers and GPU Market Strategies 00:14:15 Nvidia and Market Competition 00:15:12 Microsoft's Role in GPU Market 00:16:10 Challenges in GPU Market Dynamics 00:17:07 Economic Realities of the GPU Market 00:18:03 Real Estate Model for GPU Clouds 00:18:59 Price Sensitivity and Chip Design 00:19:55 SF Compute's Beginnings and Challenges 00:20:54 Navigating the GPU Market 00:21:54 Pivoting to a GPU Cloud Provider 00:22:53 Building a GPU Market 00:23:52 SF Compute as a GPU Marketplace 00:24:49 Market Liquidity and GPU Pricing 00:25:47 Utilization Rates in GPU Markets 00:26:44 Brokerage and Market Flexibility 00:27:42 H100 Glut and Market Cycles 00:28:40 Supply Chain Challenges and GPU Glut 00:29:35 Future Predictions for the GPU Market 00:30:33 Speculations on Test Time Inference 00:31:29 Market Demand and Test Time Inference 00:32:26 Open Source vs. Closed AI Demand 00:33:24 Future of Inference Demand 00:34:24 Peer-to-Peer GPU Markets 00:35:17 Decentralized GPU Market Skepticism 00:36:15 Redesigning Architectures for New Markets 00:37:14 Supporting Grad Students and Startups 00:38:11 Successful Startups Using SF Compute 00:39:11 VCs and GPU Infrastructure 00:40:09 VCs as GPU Credit Transformators 00:41:06 Market Timing and GPU Infrastructure 00:42:02 Understanding GPU Pricing Dynamics 00:43:01 Market Pricing and Preemptible Compute 00:43:55 Price Volatility and Market Optimization 00:44:52 Customizing Compute Contracts 00:45:50 Creating Flexible Compute Guarantees 00:46:45 Financialization of GPU Markets 00:47:44 Building a Spot Market for GPUs 00:48:40 Auditing and Standardizing Clusters 00:49:40 Ensuring Cluster Reliability 00:50:36 Active Monitoring and Refunds 00:51:33 Automating Customer Refunds 00:52:33 Challenges in Cluster Maintenance 00:53:29 Remote Cluster Management 00:54:29 Standardizing Compute Contracts 00:55:28 Unified Infrastructure for Clusters 00:56:24 Creating a Commodity Market for GPUs 00:57:22 Futures Market and Risk Management 00:58:18 Reducing Risk with GPU Futures 00:59:14 Stabilizing the GPU Market 01:00:10 SF Compute's Anti-Hype Approach 01:01:07 Calm Branding and Expectations 01:02:07 Promoting San Francisco's Beauty 01:03:03 Design Philosophy at SF Compute 01:04:02 Artistic Influence on Branding 01:05:00 Past Projects and Burnout 01:05:59 Challenges in Building an Email Client 01:06:57 Persistence and Iteration in Startups 01:07:57 Email Market Challenges 01:08:53 SF Compute Job Opportunities 01:09:53 Hiring for Systems Engineering 01:10:50 Financial Systems Engineering Role 01:11:50 Conclusion and Farewell
-
The Creators of Model Context Protocol
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-04-03 22:37
Today’s guests, David Soria Parra and Justin Spahr-Summers, are the creators of Anthropic’s Model Context Protocol (MCP). When we first wrote Why MCP Won, we had no idea how quickly it was about to win. In the past 4 weeks, OpenAI and now Google have now announced the MCP support, effectively confirming our prediction that MCP was the presumptive winner of the agent standard wars. MCP has now overtaken OpenAPI, the incumbent option and most direct alternative, in GitHub stars (3 months ahead of conservative trendline): For protocol and history nerds, we also asked David and Justin to tell the origin story of MCP, which we leave to the reader to enjoy (you can also skim the transcripts, or, the changelogs of a certain favored IDE). It’s incredible the impact that individual engineers solving their own problems can have on an entire industry. Timestamps 00:00 Introduction and Guest Welcome 00:37 What is MCP? 02:00 The Origin Story of MCP 05:18 Development Challenges and Solutions 08:06 Technical Details and Inspirations 29:45 MCP vs Open API 32:48 Building MCP Servers 40:39 Exploring Model Independence in LLMs 41:36 Building Richer Systems with MCP 43:13 Understanding Agents in MCP 45:45 Nesting and Tool Confusion in MCP 49:11 Client Control and Tool Invocation 52:08 Authorization and Trust in MCP Servers 01:01:34 Future Roadmap and Stateless Servers 01:10:07 Open Source Governance and Community Involvement 01:18:12 Wishlist and Closing Remarks
-
Unsupervised Learning x Latent Space Crossover Special
From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-03-29 07:00
Unsupervised Learning is a podcast that interviews the sharpest minds in AI about what’s real today, what will be real in the future and what it means for businesses and the world - helping builders, researchers and founders deconstruct and understand the biggest breakthroughs. Top guests: Noam Shazeer, Bob McGrew, Noam Brown, Dylan Patel, Percy Liang, David Luan https://www.latent.space/p/unsupervised-learning Timestamps 00:00 Introduction and Excitement for Collaboration 00:27 Reflecting on Surprises in AI Over the Past Year 01:44 Open Source Models and Their Adoption 06:01 The Rise of GPT Wrappers 06:55 AI Builders and Low-Code Platforms 09:35 Overhyped and Underhyped AI Trends 22:17 Product Market Fit in AI 28:23 Google's Current Momentum 28:33 Customer Support and AI 29:54 AI's Impact on Cost and Growth 31:05 Voice AI and Scheduling 32:59 Emerging AI Applications 34:12 Education and AI 36:34 Defensibility in AI Applications 40:10 Infrastructure and AI 47:08 Challenges and Future of AI 52:15 Quick Fire Round and Closing Remarks Chapters 00:00:00 Introduction and Collab Excitement 00:00:58 Open Source and Model Adoption 00:01:58 Enterprise Use of Open Source Models 00:02:57 The Competitive Edge of Closed Source Models 00:03:56 DeepSea and Open Source Model Releases 00:04:54 Market Narrative and DeepSea Impact 00:05:53 AI Engineering and GPT Wrappers 00:06:53 AI Builders and Low-Code Platforms 00:07:50 Innovating Beyond Existing Paradigms 00:08:50 Apple and AI Product Development 00:09:48 Overhyped and Underhyped AI Trends 00:10:46 Frameworks and Protocols in AI Development 00:11:45 Emerging Opportunities in AI 00:12:44 Stateful AI and Memory Innovation 00:13:44 Challenges with Memory in AI Agents 00:14:44 The Future of Model Training Companies 00:15:44 Specialized Use Cases for AI Models 00:16:44 Vertical Models vs General Purpose Models 00:17:42 General Purpose vs Domain-Specific Models 00:18:42 Reflections on Model Companies 00:19:39 Model Companies Entering Product Space 00:20:38 Competition in AI Model and Product Sectors 00:21:35 Coding Agents and Market Dynamics 00:22:35 Defensibility in AI Applications 00:23:35 Investing in Underappreciated AI Ventures 00:24:32 Analyzing Market Fit in AI 00:25:31 AI Applications with Product Market Fit 00:26:31 OpenAI's Impact on the Market 00:27:31 Google and OpenAI Competition 00:28:31 Exploring Google's Advancements 00:29:29 Customer Support and AI Applications 00:30:27 The Future of AI in Customer Support 00:31:26 Cost-Cutting vs Growth in AI 00:32:23 Voice AI and Real-World Applications 00:33:23 Scaling AI Applications for Demand 00:34:22 Summarization and Conversational AI 00:35:20 Future AI Use Cases and Market Fit 00:36:20 AI Education and Model Capabilities 00:37:17 Reforming Education with AI 00:38:15 Defensibility in AI Apps 00:39:13 Network Effects and AI 00:40:12 AI Brand and Market Positioning 00:41:11 AI Application Defensibility 00:42:09 LLM OS and AI Infrastructure 00:43:06 Security and AI Application 00:44:06 OpenAI's Role in AI Infrastructure 00:45:02 The Balance of AI Applications and Infrastructure 00:46:02 Capital Efficiency in AI Infrastructure 00:47:01 Challenges in AI DevOps and Infrastructure 00:47:59 AI SRE and Monitoring 00:48:59 Scaling AI and Hardware Challenges 00:49:58 Reliability and Compute in AI 00:50:57 Nvidia's Dominance and AI Hardware 00:51:57 Emerging Competition in AI Silicon 00:52:54 Agent Authentication Challenges 00:53:53 Dream Podcast Guests 00:54:51 Favorite News Sources and Startups 00:55:50 The Value of In-Person Conversations 00:56:50 Private vs Public AI Discourse 00:57:48 Latent Space and Podcasting 00:58:46 Conclusion and Final Thoughts