Latent Space: The AI Engineer Podcast | Latest Episode Summaries

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-25 22:36

Fei-Fei Li and Justin Johnson are cofounders of World Labs, who have recently launched Marble (https://marble.worldlabs.ai/), a new kind of generative “world model” that can create editable 3D environments from text, images, and other spatial inputs. Marble lets creators generate persistent 3D worlds, precisely control cameras, and interactively edit scenes, making it a powerful tool for games, film, VR, robotics simulation, and more. In this episode, Fei-Fei and Justin share how their journey from ImageNet and Stanford research led to World Labs, why spatial intelligence is the next frontier after LLMs, and how world models could change how machines see, understand, and build in 3D. We discuss: The massive compute scaling from AlexNet to today and why world models and spatial data are the most compelling way to “soak up” modern GPU clusters compared to language alone. What Marble actually is: a generative model of 3D worlds that turns text and images into editable scenes using Gaussian splats, supports precise camera control and recording, and runs interactively on phones, laptops, and VR headsets. Fei-fei’s essay (https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence) on spatial intelligence as a distinct form of intelligence from language: from picking up a mug to inferring the 3D structure of DNA, and why language is a lossy, low-bandwidth channel for describing the rich 3D/4D world we live in. Whether current models “understand” physics or just fit patterns: the gap between predicting orbits and discovering F=ma, and how attaching physical properties to splats and distilling physics engines into neural networks could lead to genuine causal reasoning. The changing role of academia in AI, why Fei-Fei worries more about under-resourced universities than “open vs closed,” and how initiatives like national AI compute clouds and open benchmarks can rebalance the ecosystem. Why transformers are fundamentally set models, not sequence models, and how that perspective opens up new architectures for world models, especially as hardware shifts from single GPUs to massive distributed clusters. Real use cases for Marble today: previsualization and VFX, game environments, virtual production, interior and architectural design (including kitchen remodels), and generating synthetic simulation worlds for training embodied agents and robots. How spatial intelligence and language intelligence will work together in multimodal systems, and why the goal isn’t to throw away LLMs but to complement them with rich, embodied models of the world. Fei-Fei and Justin’s long-term vision for spatial intelligence: from creative tools for artists and game devs to broader applications in science, medicine, and real-world decision-making. — Fei-Fei Li X: https://x.com/drfeifei LinkedIn: https://www.linkedin.com/in/fei-fei-li-4541247 Justin Johnson X: https://x.com/jcjohnss LinkedIn: https://www.linkedin.com/in/justin-johnson-41b43664 Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and the Fei-Fei Li & Justin Johnson Partnership 00:02:00 From ImageNet to World Models: The Evolution of Computer Vision 00:12:42 Dense Captioning and Early Vision-Language Work 00:19:57 Spatial Intelligence: Beyond Language Models 00:28:46 Introducing Marble: World Labs' First Spatial Intelligence Model 00:33:21 Gaussian Splats and the Technical Architecture of Marble 00:22:10 Physics, Dynamics, and the Future of World Models 00:41:09 Multimodality and the Interplay of Language and Space 00:37:37 Use Cases: From Creative Industries to Robotics and Embodied AI 00:56:58 Hiring, Research Directions, and the Future of World Labs

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-25 18:16

Fei-Fei Li and Justin Johnson are cofounders of World Labs, who have recently launched Marble (https://marble.worldlabs.ai/), a new kind of generative “world model” that can create editable 3D environments from text, images, and other spatial inputs. Marble lets creators generate persistent 3D worlds, precisely control cameras, and interactively edit scenes, making it a powerful tool for games, film, VR, robotics simulation, and more. In this episode, Fei-Fei and Justin share how their journey from ImageNet and Stanford research led to World Labs, why spatial intelligence is the next frontier after LLMs, and how world models could change how machines see, understand, and build in 3D. We discuss: The massive compute scaling from AlexNet to today and why world models and spatial data are the most compelling way to “soak up” modern GPU clusters compared to language alone. What Marble actually is: a generative model of 3D worlds that turns text and images into editable scenes using Gaussian splats, supports precise camera control and recording, and runs interactively on phones, laptops, and VR headsets. Fei-fei’s essay (https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence) on spatial intelligence as a distinct form of intelligence from language: from picking up a mug to inferring the 3D structure of DNA, and why language is a lossy, low-bandwidth channel for describing the rich 3D/4D world we live in. Whether current models “understand” physics or just fit patterns: the gap between predicting orbits and discovering F=ma, and how attaching physical properties to splats and distilling physics engines into neural networks could lead to genuine causal reasoning. The changing role of academia in AI, why Fei-Fei worries more about under-resourced universities than “open vs closed,” and how initiatives like national AI compute clouds and open benchmarks can rebalance the ecosystem. Why transformers are fundamentally set models, not sequence models, and how that perspective opens up new architectures for world models, especially as hardware shifts from single GPUs to massive distributed clusters. Real use cases for Marble today: previsualization and VFX, game environments, virtual production, interior and architectural design (including kitchen remodels), and generating synthetic simulation worlds for training embodied agents and robots. How spatial intelligence and language intelligence will work together in multimodal systems, and why the goal isn’t to throw away LLMs but to complement them with rich, embodied models of the world. Fei-Fei and Justin’s long-term vision for spatial intelligence: from creative tools for artists and game devs to broader applications in science, medicine, and real-world decision-making. — Fei-Fei Li X: https://x.com/drfeifei LinkedIn: https://www.linkedin.com/in/fei-fei-li-4541247 Justin Johnson X: https://x.com/jcjohnss LinkedIn: https://www.linkedin.com/in/justin-johnson-41b43664 Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and the Fei-Fei Li & Justin Johnson Partnership 00:02:00 From ImageNet to World Models: The Evolution of Computer Vision 00:12:33 Dense Captioning and Early Vision-Language Work 00:19:39 Spatial Intelligence: Beyond Language Models 00:28:49 Introducing Marble: World Labs' First Spatial Intelligence Model 00:33:24 Gaussian Splats and the Technical Architecture of Marble 00:35:50 Physics, Dynamics, and the Future of World Models 00:44:00 Multimodality and the Interplay of Language and Space 00:37:37 Use Cases: From Creative Industries to Robotics and Embodied AI 00:57:03 Hiring, Research Directions, and the Future of World Labs

⚡️ 10x AI Engineers with $1m Salaries — Alex Lieberman & Arman Hezarkhani, Tenex

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-19 17:23

Alex Lieberman and Arman Hezarkani, co-founders of Tenex, reveal how they're revolutionizing software consulting by compensating AI engineers for output rather than hours—enabling some engineers to earn over $1 million annually while delivering 10x productivity gains. Their company represents a fundamental rethinking of knowledge work compensation in the age of AI agents, where traditional hourly billing models perversely incentivize slower work even as AI tools enable unprecedented speed. The Genesis: From 90% Downsizing to 10x Output The story behind 10X begins with Arman's previous company, Parthian, where he was forced to downsize his engineering team by 90%. Rather than collapse, Arman re-architected the entire product and engineering process to be AI-first—and discovered that production-ready software output increased 10x despite the massive headcount reduction. This counterintuitive result exposed a fundamental misalignment: engineers compensated by the hour are disincentivized from leveraging AI to work faster, even when the technology enables dramatic productivity gains. Alex, who had invested in Parthian, initially didn't believe the numbers until Arman walked him through why LLMs have made such a profound impact specifically on engineering as knowledge work. The Economic Model: Story Points Over Hours 10X's core innovation is compensating engineers based on story points—units of completed, quality output—rather than hours worked. This creates direct economic incentives for engineers to adopt every new AI tool, optimize their workflows, and maximize throughput. The company expects multiple engineers to earn over $1 million in cash compensation next year purely from story point earnings. To prevent gaming the system, they hire for two profiles: engineers who are "long-term selfish" (understanding that inflating story points will destroy client relationships) and those who genuinely love writing code and working with smart people. They also employ technical strategists incentivized on client retention (NRR) who serve as the final quality gate before any engineering plan reaches a client. Impressive Builds: From Retail AI to App Store Hits The results speak for themselves. In one project, 10X built a computer vision system for retail cameras that provides heat maps, queue detection, shelf stocking analysis, and theft detection—creating early prototypes in just two weeks for work that previously took quarters. They built Snapback Sports' mobile trivia app in one month, which hit 20th globally on the App Store. In a sales context, an engineer spent four hours building a working prototype of a fitness influencer's AI health coach app after the prospect initially said no—immediately moving 10X to the top of their vendor list. These examples demonstrate how AI-enabled speed fundamentally changes sales motions and product development timelines. The Interview Process: Unreasonably Difficult Take-Homes Despite concerns that AI would make take-home assessments obsolete, 10X still uses them—but makes them "unreasonably difficult." About 50% of candidates don't even respond, but those who complete the challenge demonstrate the caliber needed. The interview process is remarkably short: two calls before the take-home, review, then one or two final meetings—completable in as little as a week. A signature question: "If you had infinite resources to build an AI that could replace either of us on this call, what would be the first major bottleneck?" The sophisticated answer isn't just "model intelligence" or "context length"—it's controlling entropy, the accumulating error rate that derails autonomous agents over time. The Limiting Factor: Human Capital, Not Technology Despite being an AI-first company, 10X's primary constraint is human capital—finding and hiring enough exceptional engineers fast enough, then matching them with the right processes to maintain delivery quality as they scale. The company has ambitions beyond consulting to build their own technology, but for the foreseeable future, recruiting remains the bottleneck. This reveals an important insight about the AI era: even as technology enables unprecedented leverage, the constraint shifts to finding people who can harness that leverage effectively. Chapters 00:00:00 Introduction and Meeting the 10X Co-founders 00:01:29 The 10X Moment: From Hourly Billing to Output-Based Compensation 00:04:44 The Economic Model Behind 10X 00:05:42 Story Points and Measuring Engineering Output 00:08:41 Impressive Client Projects and Rapid Prototyping 00:12:22 The 10X Tech Stack: TypeScript and High Structure 00:13:21 AI Coding Tools: The Daily Evolution 00:15:05 Human Capital as the Limiting Factor 00:16:02 The Unreasonably Difficult Interview Process 00:17:14 Entropy and Context Engineering: The Future of AI Agents 00:23:28 The MCP Debate and AI Industry Sociology 00:26:01 Consulting, Digital Transformation, and Conference Insights

Anthropic, Glean & OpenRouter: How AI Moats Are Built with Deedy Das of Menlo Ventures

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-14 22:54

Deedy Das, Partner at Menlo Ventures, returns to Latent Space to discuss his journey from Glean to venture capital, the explosive rise of Anthropic, and how AI is reshaping enterprise software and coding. From investing in Anthropic early on when they had no revenue to managing the $100M Ontology Fund, Das shares insider perspectives on the fastest-growing software company in history and what's next for AI infrastructure, research investing, and the future of engineering. We cover Glean’s rise from “boring” enterprise search to a $7B AI-native company, Anthropic's meteoric rise, the strategic decisions behind products like Claude Code, and why market share in enterprise AI is shifting dramatically. Das explains his investment thesis on research companies like Goodfire, Prime Intellect, and OpenRouter and how the Anthology Fund is quietly seeding the next wave of AI infra, research, and devtools. Chapters 00:00:00 Introduction and Deedy's Return to Latent Space 00:01:20 Glean's Journey: From Boring Enterprise Search to $7B Valuation 00:15:37 Anthropic's Meteoric Rise and Market Share Dynamics 00:17:50 Claude Artifacts and Product Innovation 00:41:20 The Anthology Fund: Investing in the Anthropic Ecosystem 00:48:01 Goodfire and Mechanistic Interpretability 00:51:25 Prime Intellect and Distributed AI Training 00:53:40 OpenRouter: Building the AI Model Gateway 01:13:36 The Stargate Project and Infrastructure Arms Race 01:18:14 The Future of Software Engineering and AI Coding

⚡ Inside GitHub’s AI Revolution: Jared Palmer Reveals Agent HQ & The Future of Coding Agents

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-10 23:44

Jared Palmer, SVP at GitHub and VP of CoreAI at Microsoft, joins Latent Space for an in-depth look at the evolution of coding agents and modern developer tools. Recently joining after leading AI initiatives at Vercel, Palmer shares firsthand insights from behind the scenes at GitHub Universe, including the launch of Agent HQ which is a new collaboration hub for coding agents and developers. This episode traces Palmer’s journey from building Copilot inspired tools to pioneering the focused Next.js coding agent, v0, and explores how platform constraints fostered rapid experimentation and a breakout success in AI-powered frontend development. Palmer explains the unique advantages of GitHub’s massive developer network, the challenges of scaling agent-based workflows, and why integrating seamless AI into developer experiences is now a top priority for both Microsoft and GitHub.

⚡ [AIE CODE Preview] Inside Google Labs: Building The Gemini Coding Agent — Jed Borovik, Jules

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-10 00:10

Jed Borovik, Product Lead at Google Labs, joins Latent Space to unpack how Google is building the future of AI-powered software development with Jules. From his journey discovering GenAI through Stable Diffusion to leading one of the most ambitious coding agent projects in tech, Borovik shares behind-the-scenes insights into how Google Labs operates at the intersection of DeepMind's model development and product innovation. We explore Jules' approach to autonomous coding agents and why they run on their own infrastructure, how Google simplified their agent scaffolding as models improved, and why embeddings-based RAG is giving way to attention-based search. Borovik reveals how developers are using Jules for hours or even days at a time, the challenges of managing context windows that push 2 million tokens, and why coding agents represent both the most important AI application and the clearest path to AGI. This conversation reveals Google's positioning in the coding agent race, the evolution from internal tools to public products, and what founders, developers, and AI engineers should understand about building for a future where AI becomes the new brush for software engineering. Chapters 00:00:00 Introduction and GitHub Universe Recap 00:00:57 New York Tech Scene and East Coast Hackathons 00:02:19 From Google Search to AI Coding: Jed's Journey 00:04:19 Google Labs Mission and DeepMind Collaboration 00:06:41 Jules: Autonomous Coding Agents Explained 00:09:39 The Evolution of Agent Scaffolding and Model Quality 00:11:30 RAG vs Attention: The Shift in Code Understanding 00:13:49 Jules' Journey from Preview to Production 00:15:05 AI Engineer Summit: Community Building and Networking 00:25:06 Context Management in Long-Running Agents 00:29:02 The Future of Software Engineering with AI 00:36:26 Beyond Vibe Coding: Spec Development and Verification 00:40:20 Multimodal Input and Computer Use for Coding Agents

Priscilla Chan and Mark Zuckerberg: Frontier AI + Virtual Biology To Solve All Diseases

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-06 15:00

Today’s guests are Priscilla Chan and Mark Zuckerberg, co-founders of Biohub (fka Chan Zuckerberg Initiative). They are one of the leading institutes for AI x Bio and open science research with projects like CELLxGENE, rbio1, VariantFormer, and many more. We talked about the evolution from a broad philanthropic institute to specializing in frontier AI + bio, why they are building 12ft tall microscopes to gather better data, and how building a virtual cell model + virtual immune system could potentially help us cure all diseases. Chapters 00:00:00 Introduction and CZI's 10-Year Anniversary 00:00:56 Learning from Bill Gates 00:04:05 Science vs Translation 00:10:45 The Power of Physical Proximity in Science 00:13:55 Building the Virtual Cell: From Data to Models 00:15:51 Microscopes, Imaging, and Converting Atoms to Bits 00:23:18 AI Meets Biology: The Frontier Lab Concept 00:27:25 How Models Can Enable More Ambitious Research 00:30:15 Precision Medicine and Clinical Impact 00:45:17 The Virtual Immune System and Cellular Engineering 00:48:27 Accelerating the Timeline: What It Takes to Cure All Disease 00:28:45 Joining Forces with Evolutionary Scale

How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-11-03 16:00

OpenAI recently made waves by being the first big model lab to commit to a hyperscale AMD cluster (together with their own Titan XPUs), giving AMD the first biglab silicon win outside of Nvidia/Google. Returning guest Quentin Anthony, Head of Model Training at Zyphra and advisor at EleutherAI, has recently done this same transition. In part 1 of this pod, Quentin describes his journey from working on Oak Ridge National Lab's Frontier supercomputer to leading Zyphra's ambitious move to AMD MI300X GPUs, where they're achieving performance that beats NVIDIA H100s on certain workloads while dramatically reducing costs. The discussion dives deep into the technical challenges of kernel development, with Quentin explaining why he often bypasses high-level frameworks like Triton to write directly in ROCm or even GPU assembly when necessary. He reveals how Zyphra's hybrid transformer-Mamba models like Zamba 2 can match Llama 3 8B performance at 7B parameters, optimized specifically for edge deployment across a spectrum from 1.2B models for phones to 7B for desktops. In Part 2, Quentin then candidly discusses his experience in the controversial METR software engineering productivity study, which found that developers felt 20% faster while using AI coding tools, but were in fact 20% slower. Quentin was one of the few developers who showed measurable speedup from AI tools. He shares practical insights on avoiding the "slot machine effect" of endlessly prompting models, the importance of context rot awareness, and why he prefers direct API access over tools like Cursor to maintain complete control over model context. The conversation also covers the state of open source AI research, with Quentin arguing that siloed, focused teams with guaranteed funding produce better results than grand collaborative efforts. He explains why kernel datasets alone won't solve the GPU programming problem, the challenges of evaluating kernel quality, and why companies should invest more in ecosystem development rather than traditional marketing. https://www.linkedin.com/in/quentin-anthony/ https://www.zyphra.com/post/zamba2-7b Key Topics: • AMD MI300X advantages: 192GB VRAM, superior memory bandwidth • Writing kernels from PTX/AMD GCN assembly up through CUDA/ROCm • Hybrid attention-Mamba architectures and optimal sparsity ratios • The Menlo productivity study: achieving positive AI speedup • Context rot and why shorter conversations beat long threads • Why physicists make great ML engineers ("embryonic stem cells") • Edge deployment strategies from phones to local clusters • The future of on-device vs cloud inference routing • EleutherAI's focus on interpretability with fully open pipelines • Building velocity-focused teams over position-based hiring

⚡️ Ship AI recap: Agents, Workflows, and Python — w/ Vercel CTO Malte Ubl

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-10-31 23:48

In this conversation with Malte Ubl, CTO of Vercel (http://x.com/cramforce), we explore how the company is pioneering the infrastructure for AI-powered development through their comprehensive suite of tools including workflows, AI SDK, and the newly announced agent ecosystem. Malte shares insights into Vercel's philosophy of "dogfooding" - never shipping abstractions they haven't battle-tested themselves - which led to extracting their AI SDK from v0 and building production agents that handle everything from anomaly detection to lead qualification. The discussion dives deep into Vercel's new Workflow Development Kit, which brings durable execution patterns to serverless functions, allowing developers to write code that can pause, resume, and wait indefinitely without cost. Malte explains how this enables complex agent orchestration with human-in-the-loop approvals through simple webhook patterns, making it dramatically easier to build reliable AI applications. We explore Vercel's strategic approach to AI agents, including their DevOps agent that automatically investigates production anomalies by querying observability data and analyzing logs - solving the recall-precision problem that plagues traditional alerting systems. Malte candidly discusses where agents excel today (meeting notes, UI changes, lead qualification) versus where they fall short, emphasizing the importance of finding the "sweet spot" by asking employees what they hate most about their jobs. The conversation also covers Vercel's significant investment in Python support, bringing zero-config deployment to Flask and FastAPI applications, and their vision for security in an AI-coded world where developers "cannot be trusted." Malte shares his perspective on how CTOs must transform their companies for the AI era while staying true to their core competencies, and why maintaining strong IC (individual contributor) career paths is crucial as AI changes the nature of software development. What was launched at Ship AI 2025: AI SDK 6.0 & Agent Architecture Agent Abstraction Philosophy: AI SDK 6 introduces an agent abstraction where you can "define once, deploy everywhere". How does this differ from existing agent frameworks like LangChain or AutoGPT? What specific pain points did you observe in production that led to this design? Human-in-the-Loop at Scale: The tool approval system with needsApproval: true gates actions until human confirmation. How do you envision this working at scale for companies with thousands of agent executions? What's the queue management and escalation strategy? Type Safety Across Models: AI SDK 6 promises "end-to-end type safety across models and UI". Given that different LLMs have varying capabilities and output formats, how do you maintain type guarantees when swapping between providers like OpenAI, Anthropic, or Mistral? Workflow Development Kit (WDK) Durability as Code: The use workflow primitive makes any TypeScript function durable with automatic retries, progress persistence, and observability. What's happening under the hood? Are you using event sourcing, checkpoint/restart, or a different pattern? Infrastructure Provisioning: Vercel automatically detects when a function is durable and dynamically provisions infrastructure in real-time. What signals are you detecting in the code, and how do you determine the optimal infrastructure configuration (queue sizes, retry policies, timeout values)? Vercel Agent (beta) Code Review Validation: The Agent reviews code and proposes "validated patches". What does "validated" mean in this context? Are you running automated tests, static analysis, or something more sophisticated? AI Investigations: Vercel Agent automatically opens AI investigations when it detects performance or error spikes using real production data. What data sources does it have access to? How does it distinguish between normal variance and actual anomalies? Python Support (For the first time, Vercel now supports Python backends natively.) Marketplace & Agent Ecosystem Agent Network Effects: The Marketplace now offers agents like CodeRabbit, Corridor, Sourcery, and integrations with Autonoma, Braintrust, Browser Use. How do you ensure these third-party agents can't access sensitive customer data? What's the security model? "An Agent on Every Desk" Program Vercel launched a new program to help companies identify high-value use cases and build their first production AI agents. It provides consultations, reference templates, and hands-on support to go from idea to deployed agent

The Agents Economy Backbone - with Emily Glassberg Sands, Head of Data & AI at Stripe

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-10-30 20:51

Emily Glassberg Sands is the Head of Data & AI at Stripe where she leads the organization’s efforts to build financial infrastructure for the internet & leverage AI to power Stripe’s products. Stripe processes about $1.4 trillion in payments annually (~1.3% of global GDP), making it an exciting opportunity to apply AI & ML at scale. In this episode, Emily shares insights into how Stripe is using AI to solve complex problems like fraud detection, optimizing checkout experiences, & enabling new business models for AI companies. Emily also shares her economist perspective on market efficiency & how Stripe’s focus on building economic infrastructure for AI is driving growth across the ecosystem. We discuss: Stripe’s domain-specific foundation model and “payments embeddings” that run inline on the charge path to detect sophisticated card-testing at scale (improved detection rates at large users from ~59% to ~97%). The launch of the Agentic Commerce Protocol (ACP) with OpenAI, creating a shared standard for how businesses can expose products to AI agents which is used by Walmart and Sam’s Club. How Stripe is helping AI companies manage new fraud vectors, such as free trial and refund abuse, and the importance of real-time, outcome-based billing The impact of AI on Stripe’s internal operations, including the use of LLMs for code generation, merchant understanding, and internal tooling Why many AI companies are going global day-one how Stripe’s Link network (200M+ consumers) concentrates AI demand. Whether we're in an AI bubble, why GDP hasn't reflected AI productivity gains yet, and how agentic commerce could expand consumption by removing time constraints for high-income consumers Emily’s perspective on the changing social contract around AI, the importance of deep thinking, and the role of brand and design in AI-driven products — Where to find Emily Sands X: https://x.com/emilygsands LinkedIn: https://www.linkedin.com/in/egsands/ Where to find Shawn Wang X: https://x.com/swyx LinkedIn: https://www.linkedin.com/in/shawnswyxwang/ Where to find Alessio Fanelli X: https://x.com/FanaHOVA LinkedIn: https://www.linkedin.com/in/fanahova/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and Emily's Role at Stripe 00:09:55 AI Business Models and Fraud Challenges 00:13:49 Extending Radar for AI Economy 00:16:42 Payment Innovation: Token Billing and Stablecoins 00:23:09 Agentic Commerce Protocol Launch 00:29:40 Good Bots vs Bad Bots in AI 00:40:31 Designing the Agents Commerce Protocol 00:49:32 Internal AI Adoption at Stripe 01:04:53 Data Discovery and Text-to-SQL Challenges 01:21:00 AI Economy Analysis: Bubble or Boom?

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-10-16 17:38

In this deep dive with Kyle Corbitt, co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC's Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conversation reveals why 90% of AI projects remain stuck in proof-of-concept purgatory - not due to capability limitations, but reliability issues that Kyle believes can be solved through continuous learning from real-world experience. He discusses the breakthrough of RULER (Relative Universal Reinforcement Learning Elicited Rewards), which uses LLMs as judges to rank agent behaviors relatively rather than absolutely, making RL training accessible without complex reward engineering. Kyle candidly assesses the challenges of building realistic training environments for agents, explaining why GRPO (despite its advantages) may be a dead end due to its requirement for perfectly reproducible parallel rollouts. He shares insights on why LoRAs remain underrated for production deployments, why GEPA and prompt optimization haven't lived up to the hype in his testing, and why the hardest part of deploying agents isn't the AI - it's sandboxing real-world systems with all their bugs and edge cases intact. The discussion also covers OpenPipe's acquisition by CoreWeave, the launch of their serverless reinforcement learning platform, and Kyle's vision for a future where every deployed agent continuously learns from production experience. He predicts that solving the reliability problem through continuous RL could unlock 10x more AI inference demand from projects currently stuck in development, fundamentally changing how we think about agent deployment and maintenance. Key Topics: The rise and fall of fine-tuning as a business model Why 90% of AI projects never reach production RULER: Making RL accessible through relative ranking The environment problem: Why sandboxing is harder than training GRPO vs PPO and the future of RL algorithms LoRAs: The underrated deployment optimization Why GEPA and prompt optimization disappointed in practice Building world models as synthetic training environments The $500B Stargate bet and OpenAI's potential crypto play Continuous learning as the path to reliable agents References https://www.linkedin.com/in/kcorbitt/ Aug 2023 https://openpipe.ai/blog/from-prompts-to-models DEC 2023 https://openpipe.ai/blog/mistral-7b-fine-tune-optimized JAN 2024 https://openpipe.ai/blog/s-lora MAY 2024 https://openpipe.ai/blog/the-ten-commandments-of-fine-tuning-in-prod https://www.youtube.com/watch?v=-hYqt8M9u_M Oct 2024 https://openpipe.ai/blog/announcing-dpo-support AIE NYC 2025 Finetuning 500m agents https://www.youtube.com/watch?v=zM9RYqCcioM&t=919s AIEWF 2025 How to train your agent (ART-E) https://www.youtube.com/watch?v=gEDl9C8s_-4&t=216s SEPT 2025 ACQUISTION https://openpipe.ai/blog/openpipe-coreweave W&B Serverless RL https://openpipe.ai/blog/serverless-rl?refresh=1760042248153

DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-10-07 16:35

At OpenAI DevDay, we sit down with Sherwin Wu and Christina Cai from the OpenAI Platform Team to discuss the launch of AgentKit - a comprehensive suite of tools for building, deploying, and optimizing AI agents. Christina walks us through the live demo she performed on stage, building a customer support agent in just 8 minutes using the visual Agent Builder, while Sherwin shares insights on how OpenAI is inverting the traditional website-chatbot paradigm by embedding apps directly within ChatGPT through the new Apps SDK. The conversation explores how OpenAI is tackling the challenges developers face when taking agents to production - from writing and optimizing prompts to building evaluation pipelines. They discuss the decision to adopt Anthropic's MCP protocol for tool connectivity, the importance of visual workflows for complex agent systems, and how features like human-in-the-loop approvals and automated prompt optimization are making agent development more accessible to a broader range of developers. Sherwin and Christina also reveal how OpenAI is dogfooding these tools internally, with their own customer support at openai.com already powered by AgentKit, and share candid insights about the evolution from plugins to GPTs to this new agent platform. They discuss the surprising persistence of prompting as a critical skill (contrary to predictions from two years ago), the challenges of serving custom fine-tuned models at scale, and why they believe visual agent builders are essential as workflows grow to span dozens of nodes. Guests: Sherwin Wu: Head of Engineering, OpenAI Platform https://www.linkedin.com/in/sherwinwu1/ https://x.com/sherwinwu?lang=en Christina Huang: Platform Experience, OpenAI https://x.com/christinaahuang https://www.linkedin.com/in/christinaahuang/ Thanks very much to Lindsay and Shaokyi for helping us set up this great deepdive into the new DevDay launches! Key Topics: • AgentKit launch: Agent SDK, Builder, Evals, and deployment tools • Apps SDK and the inversion of the app-chatbot paradigm • Adopting MCP protocol for universal tool connectivity • Visual agent building vs code-first approaches • Human-in-the-loop workflows and approval systems • Automated prompt optimization and "zero-gradient fine-tuning" • Service Health Dashboard and achieving five nines reliability • ChatKit as an embeddable, evergreen chat interface • The evolution from plugins to GPTs to agent platforms • Internal dogfooding with Codex and agent-powered support

Taste is your Moat (Dylan Field of Figma)

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-10-02 15:26

Dylan Field (CEO Figma) on how they are letting designers build with Figma Make, how Figma can be the context repository for aesthetic in the age of vibe coding, and why design is your only differentiator now. Full show notes: https://www.latent.space/p/figma 00:00 Figma’s Mission: Bridging Imagination and Reality 00:56 Becoming AI-Pilled 07:44 Figma Make 08:57 Language as the Interface for Design 13:37 Source of truth between design and code 18:15 Figma as a Context Repository 21:30 Understanding and Representing Design Diffs through AI 24:20 Figma’s Role in Shaping Visual Aesthetics 31:56 Fast Fashion in Software 36:04 Limitations of Prompt-Based Software Creation 39:43 Interfaces Beyond Chat 42:12 Lessons from the Thiel Fellowship 44:58 Using X for Product Feedback 48:10 Early-Stage Recruiting at Figma 53:11 Positioning Figma Make in the Prompt-to-App Landscape 55:19 Digital Scarcity & AI

Amp: The Emperor Has No Clothes

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-09-25 23:08

Quinn Slack (CEO) and Thorsten Ball (Amp Dictator) from SourceGraph join the show to talk about Amp Code, how they ship 15x/day with no code reviews, and why subagents and prompt optimizers aren’t a promising direction for coding agents. Amp Code: https://ampcode.com/ Latent Space: https://latent.space/ 00:00 Introduction 00:41 Transition from Cody to Amp 03:18 The Importance of Building the Best Coding Agent 06:43 Adapting to a Rapidly Evolving AI Tooling Landscape 09:36 Dogfooding at Sourcegraph 12:35 CLI vs. VS Code Extension 21:08 Positioning Amp in Coding Agent Market 24:10 The Diminishing Importance of Model Selectors 32:39 Tooling vs. Harness 37:19 Common Failure Modes of Coding Agents 47:33 Agent-Friendly Logging and Tooling 52:31 Are Subagents Real? 56:52 New Frameworks and Agent-Integrated Developer Tools 1:00:25 How Agents Are Encouraging Codebase and Workflow Changes 1:03:13 Evolving Outer Loop Tasks 1:07:09 Version Control and Merge Conflicts in an AI-First World 1:10:36 Rise of User-Generated Enterprise Software 1:14:39 Empowering Technical Leaders with AI 1:17:11 Evaluating Product Without Traditional Evals 1:20:58 Hiring

Context Engineering for Agents - Lance Martin, LangChain

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-09-11 16:30

Lance: https://www.linkedin.com/in/lance-martin-64a33b5/ How Context Fails: https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html How New Buzzwords Get Created: https://www.dbreunig.com/2025/07/24/why-the-term-context-engineering-matters.html Content Engineering: https://x.com/RLanceMartin/status/1948441848978309358 https://rlancemartin.github.io/2025/06/23/context_engineering/ https://docs.google.com/presentation/d/16aaXLu40GugY-kOpqDU4e-S0hD1FmHcNyF0rRRnb1OU/edit?usp=sharing Manus Post: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus Cognition Post: https://cognition.ai/blog/dont-build-multi-agents Multi-Agent Researcher: https://www.anthropic.com/engineering/multi-agent-research-system Human-in-the-loop + Memory: https://github.com/langchain-ai/agents-from-scratch - Bitter Lesson in AI Engineering - Hyung Won Chung on the Bitter Lesson in AI Research: https://www.youtube.com/watch?v=orDKvo8h71o Bitter Lesson w/ Claude Code: https://www.youtube.com/watch?v=Lue8K2jqfKk&t=1s Learning the Bitter Lesson in AI Engineering: https://rlancemartin.github.io/2025/07/30/bitter_lesson/ Open Deep Research: https://github.com/langchain-ai/open_deep_research https://academy.langchain.com/courses/deep-research-with-langgraph Scaling and building things that "don't yet work": https://www.youtube.com/watch?v=p8Jx4qvDoSo - Frameworks - Roast framework at Shopify / standardization of orchestration tools: https://www.youtube.com/watch?v=0NHCyq8bBcM MCP adoption within Anthropic / standardization of protocols: https://www.youtube.com/watch?v=xlEQ6Y3WNNI How to think about frameworks: https://blog.langchain.com/how-to-think-about-agent-frameworks/ RAG benchmarking: https://rlancemartin.github.io/2025/04/03/vibe-code/ Simon's talk with memory-gone-wrong: https://simonwillison.net/2025/Jun/6/six-months-in-llms/

A Technical History of Generative Media

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-09-05 15:52

Today we are joined by Gorkem and Batuhan from Fal.ai, the fastest growing generative media inference provider. They recently raised a $125M Series C and crossed $100M ARR. We covered how they pivoted from dbt pipelines to diffusion models inference, what were the models that really changed the trajectory of image generation, and the future of AI videos. Enjoy! 00:00 - Introductions 04:58 - History of Major AI Models and Their Impact on Fal.ai 07:06 - Pivoting to Generative Media and Strategic Business Decisions 10:46 - Technical discussion on CUDA optimization and kernel development 12:42 - Inference Engine Architecture and Kernel Reusability 14:59 - Performance Gains and Latency Trade-offs 15:50 - Discussion of model latency importance and performance optimization 17:56 - Importance of Latency and User Engagement 18:46 - Impact of Open Source Model Releases and Competitive Advantage 19:00 - Partnerships with closed source model developers 20:06 - Collaborations with Closed-Source Model Providers 21:28 - Serving Audio Models and Infrastructure Scalability 22:29 - Serverless GPU infrastructure and technical stack 23:52 - GPU Prioritization: H100s and Blackwell Optimization 25:00 - Discussion on ASICs vs. General Purpose GPUs 26:10 - Architectural Trends: MMDiTs and Model Innovation 27:35 - Rise and Decline of Distillation and Consistency Models 28:15 - Draft Mode and Streaming in Image Generation Workflows 29:46 - Generative Video Models and the Role of Latency 30:14 - Auto-Regressive Image Models and Industry Reactions 31:35 - Discussion of OpenAI's Sora and competition in video generation 34:44 - World Models and Creative Applications in Games and Movies 35:27 - Video Models’ Revenue Share and Open-Source Contributions 36:40 - Rise of Chinese Labs and Partnerships 38:03 - Top Trending Models on Hugging Face and ByteDance's Role 39:29 - Monetization Strategies for Open Models 40:48 - Usage Distribution and Model Turnover on FAL 42:11 - Revenue Share vs. Open Model Usage Optimization 42:47 - Moderation and NSFW Content on the Platform 44:03 - Advertising as a key use case for generative media 45:37 - Generative Video in Startup Marketing and Virality 46:56 - LoRA Usage and Fine-Tuning Popularity 47:17 - LoRA ecosystem and fine-tuning discussion 49:25 - Post-Training of Video Models and Future of Fine-Tuning 50:21 - ComfyUI Pipelines and Workflow Complexity 52:31 - Requests for startups and future opportunities in the space 53:33 - Data Collection and RedPajama-Style Initiatives for Media Models 53:46 - RL for Image and Video Models: Unknown Potential 55:11 - Requests for Models: Editing and Conversational Video Models 57:12 - VO3 Capabilities: Lip Sync, TTS, and Timing 58:23 - Bitter Lesson and the Future of Model Workflows 58:44 - FAL's hiring approach and team structure 59:29 - Team Structure and Scaling Applied ML and Performance Teams 1:01:41 - Developer Experience Tools and Low-Code/No-Code Integration 1:03:04 - Improving Hiring Process with Public Challenges and Benchmarks 1:04:02 - Closing Remarks and Culture at FAL

Better Data is All You Need — Ari Morcos, Datology

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-08-29 18:19

Our chat with Ari shows that data curation is the most impactful and underinvested area in AI. He argues that the prevailing focus on model architecture and compute scaling overlooks the "bitter lesson" that "models are what they eat." Effective data curation—a sophisticated process involving filtering, rebalancing, sequencing (curriculum), and synthetic data generation—allows for training models that are simultaneously faster, better, and smaller. Morcos recounts his personal journey from focusing on model-centric inductive biases to realizing that data quality is the primary lever for breaking the diminishing returns of naive scaling laws. Datology's mission is to automate this complex curation process, making state-of-the-art data accessible to any organization and enabling a new paradigm of AI development where data efficiency, not just raw scale, drives progress. Timestamps 00:00 Introduction 00:46 What is Datology? The mission to train models faster, better, and smaller through data curation. 01:59 Ari's background: From neuroscience to realizing the "Bitter Lesson" of AI. 05:30 Key Insight: Inductive biases from architecture become less important and even harmful as data scale increases. 08:08 Thesis: Data is the most underinvested area of AI research relative to its impact. 10:15 Why data work is culturally undervalued in research and industry. 12:19 How self-supervised learning changed everything, moving from a data-scarce to a data-abundant regime. 17:05 Why automated curation is superior to human-in-the-loop, citing the DCLM study. 19:22 The "Elephants vs. Dogs" analogy for managing data redundancy and complexity. 22:46 A brief history and commentary on key datasets (Common Crawl, GitHub, Books3). 26:24 Breaking naive scaling laws by improving data quality to maintain high marginal information gain. 29:07 Datology's demonstrated impact: Achieving baseline performance 12x faster. 34:19 The business of data: Datology's moat and its relationship with open-source datasets. 39:12 Synthetic Data Explain ed: The difference between risky "net-new" creation and powerful "rephrasing." 49:02 The Resurgence of Curriculum Learning: Why ordering data matters in the underfitting regime. 52:55 The Future of Training: Optimizing pre-training data to make post-training more effective. 54:49 Who is training their own models and why (Sovereign AI, large enterprises). 57:24 "Train Smaller": Why inference cost makes smaller, specialized models the ultimate goal for enterprises. 01:00:19 The problem with model pruning and why data-side solutions are complementary. 01:03:03 On finding the smallest possible model for a given capability. 01:06:49 Key learnings from the RC foundation model collaboration, proving that data curation "stacks." 01:09:46 Lightning Round: What data everyone wants & who should work at Datology. 01:14:24 Commentary on Meta's superintelligence efforts and Yann LeCun's role.

Long Live Context Engineering - with Jeff Huber of Chroma

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-08-19 20:54

Jeff Huber of Chroma joins us to talk about what actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows. Full show notes: https://www.latent.space/p/chroma 00:00 Introductions 00:48 Why Build Chroma 02:55 Information Retrieval vs. Search 04:29 Staying Focused in a Competitive AI Market 08:08 Building Chroma Cloud 12:15 Context Engineering and the Problems with RAG 16:11 Context Rot 21:49 Prioritizing Context Quality 27:02 Code Indexing and Retrieval Strategies 32:04 Chunk Rewriting and Query Optimization for Code 34:07 Transformer Architecture Evolution and Retrieval Systems 38:06 Memory as a Benefit of Context Engineering 40:13 Structuring AI Memory and Offline Compaction 45:46 Lessons from Previous Startups and Building with Purpose 47:32 Religion and Values in Silicon Valley 50:18 Company Culture, Design, and Brand Consistency 52:36 Hiring at Chroma: Designers, Researchers, and Engineers

Greg Brockman on OpenAI's Road to AGI

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-08-15 16:12

Greg Brockman, co-founder and president of OpenAI, joins us to talk about GPT-5 and GPT-OSS, the future of software engineering, why reinforcement learning is still scaling, and how OpenAI is planning to get to AGI. 00:00 Introductions 01:04 The Evolution of Reasoning at OpenAI 04:01 Online vs Offline Learning in Language Models 06:44 Sample Efficiency and Human Curation in Reinforcement Learning 08:16 Scaling Compute and Supercritical Learning 13:21 Wall clock time limitations in RL and real-world interactions 16:34 Experience with ARC Institute and DNA neural networks 19:33 Defining the GPT-5 Era 22:46 Evaluating Model Intelligence and Task Difficulty 25:06 Practical Advice for Developers Using GPT-5 31:48 Model Specs 37:21 Challenges in RL Preferences (e.g., try/catch) 39:13 Model Routing and Hybrid Architectures in GPT-5 43:58 GPT-5 pricing and compute efficiency improvements 46:04 Self-Improving Coding Agents and Tool Usage 49:11 On-Device Models and Local vs Remote Agent Systems 51:34 Engineering at OpenAI and Leveraging LLMs 54:16 Structuring Codebases and Teams for AI Optimization 55:27 The Value of Engineers in the Age of AGI 58:42 Current state of AI research and lab diversity 01:01:11 OpenAI’s Prioritization and Focus Areas 01:03:05 Advice for Founders: It's Not Too Late 01:04:20 Future outlook and closing thoughts 01:04:33 Time Capsule to 2045: Future of Compute and Abundance 01:07:07 Time Capsule to 2005: More Problems Will Emerge

The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)

From 🇺🇸 Latent Space: The AI Engineer Podcast, published at 2025-07-31 15:30

Chapters 00:00:00 Welcome and Guest Introduction 00:01:18 Tulu, OVR, and the RLVR Journey 00:03:40 Industry Approaches to Post-Training and Preference Data 00:06:08 Understanding RLVR and Its Impact 00:06:18 Agents, Tool Use, and Training Environments 00:10:34 Open Data, Human Feedback, and Benchmarking 00:12:44 Chatbot Arena, Sycophancy, and Evaluation Platforms 00:15:42 RLHF vs RLVR: Books, Algorithms, and Future Directions 00:17:54 Frontier Models: Reasoning, Hybrid Models, and Data 00:22:11 Search, Retrieval, and Emerging Model Capabilities 00:29:23 Tool Use, Curriculum, and Model Training Challenges 00:38:06 Skills, Planning, and Abstraction in Agent Models 00:46:50 Parallelism, Verifiers, and Scaling Approaches 00:54:33 Overoptimization and Reward Design in RL 01:02:27 Open Models, Personalization, and the Model Spec 01:06:50 Open Model Ecosystem and Infrastructure 01:13:05 Meta, Hardware, and the Future of AI Competition 01:15:42 Building an Open DeepSeek and Closing Thoughts We first had Nathan on to give us his RLHF deep dive when he was joining AI2, and now he’s back to help us catch up on the evolution to RLVR (Reinforcement Learning with Verifiable Rewards), first proposed in his Tulu 3 paper. While RLHF remains foundational, RLVR has emerged as a powerful approach for training models on tasks with clear success criteria and using verifiable, objective functions as reward signals—particularly useful in domains like math, code correctness, and instruction-following. Instead of relying solely on subjective human feedback, RLVR leverages deterministic signals to guide optimization, making it more scalable and potentially more reliable across many domains. However, he notes that RLVR is still rapidly evolving, especially regarding how it handles tool use and multi-step reasoning. We also discussed the Tulu model series, a family of instruction-tuned open models developed at AI2. Tulu is designed to be a reproducible, state-of-the-art post-training recipe for the open community. Unlike frontier labs like OpenAI or Anthropic, which rely on vast and often proprietary datasets, Tulu aims to distill and democratize best practices for instruction and preference tuning. We are impressed with how small eval suites, careful task selection, and transparent methodology can rival even the best proprietary models on specific benchmarks. One of the most fascinating threads is the challenge of incorporating tool use into RL frameworks. Lambert highlights that while you can prompt a model to use tools like search or code execution, getting the model to reliably learn when and how to use them through RL is much harder. This is compounded by the difficulty of designing reward functions that avoid overoptimization—where models learn to “game” the reward signal rather than solve the underlying task. This is particularly problematic in code generation, where models might reward hack unit tests by inserting pass statements instead of correct logic. As models become more agentic and are expected to plan, retrieve, and act across multiple tools, reward design becomes a critical bottleneck. Other topics covered: - The evolution from RLHF (Reinforcement Learning from Human Feedback) to RLVR (Reinforcement Learning from Verifiable Rewards) - The goals and technical architecture of the Tulu models, including the motivation to open-source post-training recipes - Challenges of tool use in RL: verifiability, reward design, and scaling across domains - Evaluation frameworks and the role of platforms like Chatbot Arena and emerging “arena”-style benchmarks - The strategic tension between hybrid reasoning models and unified reasoning models at the frontier - Planning, abstraction, and calibration in reasoning agents and why these concepts matter - The future of open-source AI models, including DeepSeek, OLMo, and the potential for an “American DeepSeek” - The importance of model personality, character tuning, and the model spec paradigm - Overoptimization in RL settings and how it manifests in different domains (control tasks, code, math) - Industry trends in inference-time scaling and model parallelism Finally, the episode closes with a vision for the future of open-source AI. Nathan has now written up his ambition to build an “American DeepSeek”—a fully open, end-to-end reasoning-capable model with transparent training data, tools, and infrastructure. He emphasizes that open-source AI is not just about weights; it’s about releasing recipes, evaluations, and methods that lower the barrier for everyone to build and understand cutting-edge systems. It would seem the

Latest episodes from "Latent Space: The AI Engineer Podcast"