🇺🇸 United States Episodes

14736 episodes from United States

If You Want to Destroy My Sweater, Hold This Thread as I Walk Away

From This American Life

The tiny thing that unravels your world. Visit thisamericanlife.org/lifepartners to sign up for our premium subscription.Prologue: Ira talks to Chris Benderev, whose high school years were completely upended by an impromptu thing his teacher said. (8 minutes)Act One: For Producer Lilly Sullivan, there’s one story about her parents that defines how she sees them, their family, and their history. She finds out it might be wrong. (27 minutes)Act Two: For years, Mike Comite has replayed in his head the moment when he and his bandmate blew their shot of making it as musicians. He sets out to uncover how it all went awry. (13 minutes)Act Three: Six million Syrians fled the country after the start of its civil war. A few weeks ago, one woman watched from afar as everything in her home country changed forever – again. (9 minutes)Transcripts are available at thisamericanlife.orgThis American Life privacy policy.Learn more about sponsor message choices.

Venezuela’s recent economic history (Update)

From Planet Money

We’ve been checking in on the economic conditions in Venezuela for about a decade now. In response to the U.S. strike and the capture of Venezuelan president Nicolás Maduro this weekend, we’re re-surfacing this episode with an update.The original version ran in 2016, with an update in 2024.Back in 2016, things were pretty bad in Venezuela. Grocery stores didn’t have enough food. Hospitals didn’t have basic supplies, like gauze. Child mortality was spiking. Businesses were shuttering. It was one of the epic economic collapses of our time. And it was totally avoidable.Venezuela used to be a relatively rich country. It has just about all the economic advantages a country could ask for: Beautiful beaches and mountains ready for tourism, fertile land good for farming, an educated population, and oil, lots and lots of oil.But during the boom years, the Venezuelan government made some choices that add up to an economic time bomb.Today on the show, we run through the decisions that foreshadowed the collapse, and we hear from people in Venezuela in 2016 at a particularly low point for the economy, then again and in 2024 after a bounce back and a stabilization, in part due to the unlikely impact of the U.S. dollar. Pre-order the Planet Money book and get a free gift. / Subscribe to Planet Money+Listen free: Apple Podcasts, Spotify, the NPR app or anywhere you get podcasts.Facebook / Instagram / TikTok / Our weekly Newsletter.This original episode was hosted by Robert Smith and Noel King. It was produced by Nick Fountain and Sally Helm. Our update in 2024 was hosted by Amanda Aronczyk, produced by Sean Saldana, fact checked by Sierra Juarez, and engineered by Neal Rauch. Today's episode was hosted by Kenny Malone and produced by James Sneed. Alex Goldmark is our Executive Producer. For sponsor-free episodes of The Indicator and Planet Money, subscribe to Planet Money+ via Apple Podcasts or at plus.npr.org.Learn more about sponsor message choices: podcastchoices.com/adchoicesNPR Privacy Policy

Sunday Pick: Building atomic habits with James Clear | from ReThinking with Adam Grant

From TED Talks Daily

As a blogger and executive coach, James Clear spent years studying how to form and change habits. His research culminated in the book "Atomic Habits”, which has sold more than 15 million copies and been translated into over 50 languages. James speaks with Adam about changing our systems for achieving goals, building habits around identities as well as actions, and accumulating small wins that add up to big change. Transcripts for ReThinking are available at go.ted.com/RWAGscripts Hosted on Acast. See acast.com/privacy for more information.

55. What Changes Will Stick When the Pandemic Is Gone?

From No Stupid Questions

Also: would you take a confirmation-bias vaccine? This episode originally aired on June 6th, 2021.

Original title: 55. What Changes Will Stick When the Pandemic Is Gone?

Original description: Also: would you take a confirmation-bias vaccine? This episode originally aired on June 6th, 2021.

Interview: What happens in your brain when you pay attention? with Dr. Sasha Hamdani | from TED Health

From TED Talks Daily

Attention isn't just about what we focus on -- it's also about what our brains filter out. By investigating patterns in the brain as people try to focus, computational neuroscientist Mehdi Ordikhani-Seyedlar hopes to build computer models that can be used to treat ADHD and help those who have lost the ability to communicate. Hear more about this exciting science in this brief, fascinating talk. After the talk, Shoshana speaks with psychiatrist and ADHD specialist Dr. Sasha Hamdani on transforming healthcare for patients and families with ADHD. Hosted on Acast. See acast.com/privacy for more information.

#1041 - Dr Debra Lieberman - Why Don’t You Have Sex With Your Sister?

From Modern Wisdom

Dr. Debra Lieberman is an evolutionary psychologist, professor, and researcher. Why don’t we feel sexual attraction toward our siblings or close family? Evolution seems to have hard-wired the brain to prevent inbreeding, a pattern shared with many other animals. So how does this mechanism work, and what are the moral or ethical arguments surrounding incest? Expect to learn why evolution has designed you to not want sex with your sister, how animals actually detect who their relatives are, what the high level explanation is for why humans don’t want sex with their kin, the moral argument if it is okay if two adult siblings had consensual sex, how big the actual genetic risk is for first cousins, what crying adn tears actually communicate from an evolutionary perspective and much more… Sponsors: See discounts for all the products I use and recommend: ⁠https://chriswillx.com/deals⁠ New pricing since recording: Function is now just $365, plus get $25 off at https://functionhealth.com/modernwisdom Get 35% off your first subscription on the best supplements from Momentous at https://livemomentous.com/modernwisdom Get a free bottle of D3K2, an AG1 Welcome Kit, and more when you first subscribe at https://ag1.info/modernwisdom Get a Free Sample Pack of LMNT’s most popular flavours with your first purchase at https://drinklmnt.com/modernwisdom Extra Stuff: Get my free reading list of 100 books to read before you die: ⁠https://chriswillx.com/books⁠ Try my productivity energy drink Neutonic: ⁠https://neutonic.com/modernwisdom⁠ Episodes You Might Enjoy: #577 - David Goggins - This Is How To Master Your Life: ⁠https://tinyurl.com/43hv6y59⁠ #712 - Dr Jordan Peterson - How To Destroy Your Negative Beliefs: ⁠https://tinyurl.com/2rtz7avf⁠ #700 - Dr Andrew Huberman - The Secret Tools To Hack Your Brain: ⁠https://tinyurl.com/3ccn5vkp⁠ - Get In Touch: Instagram: ⁠https://www.instagram.com/chriswillx⁠ Twitter: ⁠https://www.twitter.com/chriswillx⁠ YouTube: ⁠https://www.youtube.com/modernwisdompodcast⁠ Email: ⁠https://chriswillx.com/contact⁠ - Learn more about your ad choices. Visit megaphone.fm/adchoices

Ian Carroll on America’s Deadliest Mass Shooting and Unanswered Questions They Don’t Want You to Ask

From The Tucker Carlson Show

The 2017 Las Vegas massacre was by far the deadliest mass shooting in American history. The official explanation for it makes no sense. Ian Carroll explains what we know for sure. (00:00) What Was the Las Vegas Shooting? (10:43) The Active Shooter at McCarran Airport (16:40) The Suspicious Deaths of Witnesses (25:30) What Was Stephen Paddock's Motive? (34:37) What Happened to Jose Campos? (1:05:18) How Did America Change After the Shooting? Paid partnerships with: Masa Chips: Get 25% off with code TUCKER at https://masachips.com/tucker Black Rifle Coffee: Promo code "Tucker" for 30% off at https://blackriflecoffee.com TCN: Watch our new outdoor series at https://tuckercarlson.com/americangame Learn more about your ad choices. Visit megaphone.fm/adchoices

[NeurIPS Best Paper] 1000 Layer Networks for Self-Supervised RL — Kevin Wang et al, Princeton

From Latent Space: The AI Engineer Podcast

From undergraduate research seminars at Princeton to winning Best Paper award at NeurIPS 2025, Kevin Wang, Ishaan Javali, Michał Bortkiewicz, Tomasz Trzcinski, Benjamin Eysenbach defied conventional wisdom by scaling reinforcement learning networks to 1,000 layers deep—unlocking performance gains that the RL community thought impossible. We caught up with the team live at NeurIPS to dig into the story behind RL1000: why deep networks have worked in language and vision but failed in RL for over a decade (spoiler: it's not just about depth, it's about the objective), how they discovered that self-supervised RL (learning representations of states, actions, and future states via contrastive learning) scales where value-based methods collapse, the critical architectural tricks that made it work (residual connections, layer normalization, and a shift from regression to classification), why scaling depth is more parameter-efficient than scaling width (linear vs. quadratic growth), how Jax and GPU-accelerated environments let them collect hundreds of millions of transitions in hours (the data abundance that unlocked scaling in the first place), the "critical depth" phenomenon where performance doesn't just improve—it multiplies once you cross 15M+ transitions and add the right architectural components, why this isn't just "make networks bigger" but a fundamental shift in RL objectives (their code doesn't have a line saying "maximize rewards"—it's pure self-supervised representation learning), how deep teacher, shallow student distillation could unlock deployment at scale (train frontier capabilities with 1000 layers, distill down to efficient inference models), the robotics implications (goal-conditioned RL without human supervision or demonstrations, scaling architecture instead of scaling manual data collection), and their thesis that RL is finally ready to scale like language and vision—not by throwing compute at value functions, but by borrowing the self-supervised, representation-learning paradigms that made the rest of deep learning work. We discuss: The self-supervised RL objective: instead of learning value functions (noisy, biased, spurious), they learn representations where states along the same trajectory are pushed together, states along different trajectories are pushed apart—turning RL into a classification problem Why naive scaling failed: doubling depth degraded performance, doubling again with residual connections and layer norm suddenly skyrocketed performance in one environment—unlocking the "critical depth" phenomenon Scaling depth vs. width: depth grows parameters linearly, width grows quadratically—depth is more parameter-efficient and sample-efficient for the same performance The Jax + GPU-accelerated environments unlock: collecting thousands of trajectories in parallel meant data wasn't the bottleneck, and crossing 15M+ transitions was when deep networks really paid off The blurring of RL and self-supervised learning: their code doesn't maximize rewards directly, it's an actor-critic goal-conditioned RL algorithm, but the learning burden shifts to classification (cross-entropy loss, representation learning) instead of TD error regression Why scaling batch size unlocks at depth: traditional RL doesn't benefit from larger batches because networks are too small to exploit the signal, but once you scale depth, batch size becomes another effective scaling dimension — RL1000 Team (Princeton) 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities: https://openreview.net/forum?id=s0JVsx3bx1 Chapters 00:00:00 Introduction: Best Paper Award and NeurIPS Poster Experience 00:01:11 Team Introductions and Princeton Research Origins 00:03:35 The Deep Learning Anomaly: Why RL Stayed Shallow 00:04:35 Self-Supervised RL: A Different Approach to Scaling 00:05:13 The Breakthrough Moment: Residual Connections and Critical Depth 00:07:15 Architectural Choices: Borrowing from ResNets and Avoiding Vanishing Gradients 00:07:50 Clarifying the Paper: Not Just Big Networks, But Different Objectives 00:08:46 Blurring the Lines: RL Meets Self-Supervised Learning 00:09:44 From TD Errors to Classification: Why This Objective Scales 00:11:06 Architecture Details: Building on Braw and SymbaFowl 00:12:05 Robotics Applications: Goal-Conditioned RL Without Human Supervision 00:13:15 Efficiency Trade-offs: Depth vs Width and Parameter Scaling 00:15:48 JAX and GPU-Accelerated Environments: The Data Infrastructure 00:18:05 World Models and Next State Classification 00:22:37 Unlocking Batch Size Scaling Through Network Capacity 00:24:10 Compute Requirements: State-of-the-Art on a Single GPU 00:21:02 Future Directions: Distillation, VLMs, and Hierarchical Planning 00:27:15 Closing Thoughts: Challenging Conventional Wisdom in RL Scaling

Wartime vs Peacetime: Ben Horowitz on Leadership

From a16z Podcast

In this exclusive conversation from a16z’s Bio and Health BUILD Summit, founding partner Ben Horowitz sits down with general partner Jorge Conde. Originally released in August 2023, the episode covers everything from the inspiration behind Ben’s book The Hard Thing About Hard Things and how the open internet was secured, to the difference between wartime and peacetime CEOs, what it really means to scale culture, and how bio and healthcare innovation differs from other forms of technology. Ben’s Book: https://www.amazon.com/Hard-Thing-About-Things-Building/dp/0062273205

JRE MMA Show #172 with Gable Steveson

From Joe Rogan Experience

Joe sits down with Gable Steveson, a mixed martial artist, wrestler, boxer, and Olympic gold medalist.  www.gablesteveson.com Perplexity: Download the app or ask Perplexity anything at https://pplx.ai/rogan. Get a free welcome kit with your first subscription of AG1 at https://drinkag1.com/joerogan Athletic Brewing Co. Non-alcoholic Beer. Fit For All Times. Athletic Brewing Company LLC. Milford, CT and San Diego, CA. Near Beer <0.5% alc/vol. Learn more about your ad choices. Visit podcastchoices.com/adchoices

A 3-step guide to believing in yourself | Sheryl Lee Ralph (re-release)

From TED Talks Daily

Sheryl Lee Ralph is a force, delivering iconic performances both on stage and screen. But she didn't always know if she'd make it big. In a lively talk sparkling with actionable advice, she shares how her struggles taught her what it takes to believe in herself -- and how we can all find the self-confidence to keep moving forward. Hosted on Acast. See acast.com/privacy for more information.

The Techno-Optimist Manifesto with Marc Andreessen and Ben Horowitz

From a16z Podcast

Originally aired in October 2023, this episode centers on Marc Andreessen’s essay The Techno-Optimist Manifesto, which lays out his vision for the future of technology. The piece sparked widespread discussion across traditional and social media by challenging the prevailing pessimistic narrative around technology and arguing instead that it can be a force for growth, progress, and abundance. In this one-on-one conversation, based on listener questions from X (formerly Twitter), a16z cofounder Ben Horowitz and Marc discuss how technological advances can improve quality of life, support marginalized communities, and shape how we think about humanity’s long-term future. Read the full manifesto: https://a16z.com/the-techno-optimist-manifesto/

The science of fresh starts

From Masters of Scale

To kick off 2026, we’re revisiting our conversation with behavioral scientist Katy Milkman. The Wharton professor and bestselling author of “How to Change” sits down with host Jeff Berman to share proven ways to create positive, lasting changes in our lives and our organizations.Subscribe to the Masters of Scale weekly newsletter:&nbsp;https://mastersofscale.com/newsletter/See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Essentials: Micronutrients for Health & Longevity | Dr. Rhonda Patrick

From Huberman Lab

In this Huberman Lab Essentials episode, my guest is Dr. Rhonda Patrick, PhD, a biomedical scientist and a leading health educator focused on nutrition, aging and general health. We discuss four key micronutrients that influence cellular stress responses, inflammation, detoxification and longevity, and how to increase your intake of each through diet or supplementation. We also cover deliberate cold and heat exposure, along with exercise, and how these tools support metabolic, cardiovascular and cognitive health as we age. Read the episode show notes at hubermanlab.com. Thank you to our sponsors AGZ by AG1: https://drinkagz.com/huberman LMNT: https://drinklmnt.com/huberman Function: https://functionhealth.com/huberman Timestamps (00:00:00) Rhonda Patrick (00:00:20) Physical Challenges, Stress Response Pathways, Hormesis, Temperature (00:03:43) Tool: Sulforaphane & Detoxification, Cruciferous Vegetables, Moringa (00:06:19) Sponsor: LMNT (00:07:51) Tool: Marine Omega-3s Fatty Acids, Fish Oil Supplements (00:09:48) Benefits of Fish Oil Supplementation, Longevity, Tool: Omega-3 Index (00:12:06) Omega-3s & Inflammation (00:14:46) Sponsor: AGZ by AG1 (00:16:16) Vitamin D; Health Benefits (00:18:46) Tool: Vitamin D Supplementation, Bloodwork (00:22:11) Tool: Magnesium, Dark Leafy Greens, Supplementation (00:24:25) Sponsor: Function (00:26:05) Deliberate Cold Exposure, Mood & Dopamine (00:26:58) Cold Exposure to Enhance Mitochondria, Shivering, Browning Effect (00:31:22) Tool: High-Intensity Interval Training, Tabata Workout, Sauna, Memory (00:33:18) Sauna, Cardiovascular & Cognitive Heath; Tool: Sauna Duration & Frequency (00:38:52) Tool: Hot Bath; Acknowledgements Disclaimer & Disclosures Learn more about your ad choices. Visit megaphone.fm/adchoices

#1040 - 4M Subscriber Q&A

From Modern Wisdom

I hit 4 million Subscribers on YouTube!! To celebrate, I asked for questions from YouTube, X, and Instagram, so here’s another 90ish minutes of me trying to answer as many as possible. Expect to learn what’s new with my new haircut, how much longer until the new studio is built, if or when an Andrew Tate episode will be released, the most recurring thoughts I have when I feel sad and or disappointed sometimes, and why I think this occurs, the most favourite thing about myself, and much more… Sponsors: See discounts for all the products I use and recommend: ⁠https://chriswillx.com/deals⁠ Extra Stuff: Get my free reading list of 100 books to read before you die: ⁠https://chriswillx.com/books⁠ Try my productivity energy drink Neutonic: ⁠https://neutonic.com/modernwisdom⁠ Episodes You Might Enjoy: #577 - David Goggins - This Is How To Master Your Life: ⁠https://tinyurl.com/43hv6y59⁠ #712 - Dr Jordan Peterson - How To Destroy Your Negative Beliefs: ⁠https://tinyurl.com/2rtz7avf⁠ #700 - Dr Andrew Huberman - The Secret Tools To Hack Your Brain: ⁠https://tinyurl.com/3ccn5vkp⁠ - Get In Touch: Instagram: ⁠https://www.instagram.com/chriswillx⁠ Twitter: ⁠https://www.twitter.com/chriswillx⁠ YouTube: ⁠https://www.youtube.com/modernwisdompodcast⁠ Email: ⁠https://chriswillx.com/contact⁠ - Learn more about your ad choices. Visit megaphone.fm/adchoices

Massive Somali Fraud in Minnesota with Nick Shirley, California Asset Seizure, $20B Groq-Nvidia Deal

From All-In with Chamath, Jason, Sacks & Friedberg

(0:00) Bestie intros! Nick Shirley joins the show to discuss his recent investigation on potential daycare fraud in Minnesota (3:32) Nick's background, how he got into investigative reporting and YouTube, independence, finding this story (16:36) Why this fraud story is resonating, why the national press initially avoided it (30:08) Future plans, California, possible Al-Shabaab connection, how high up does Minnesota's fraud go? (49:15) What the scale of fraud means for America, Minnesota's future, potential patronage scheme (1:09:06) CA's wealth tax: normalizing the seizure of private property (1:33:56) Chamath breaks down the $20B Groq-Nvidia deal Follow Nick Shirley: https://x.com/nickshirleyy Follow the besties: https://x.com/chamath https://x.com/Jason https://x.com/DavidSacks https://x.com/friedberg Follow on X: https://x.com/theallinpod Follow on Instagram: https://www.instagram.com/theallinpod Follow on TikTok: https://www.tiktok.com/@theallinpod Follow on LinkedIn: https://www.linkedin.com/company/allinpod Intro Music Credit: https://rb.gy/tppkzl https://x.com/yung_spielburg Intro Video Credit: https://x.com/TheZachEffect Referenced in the show: https://x.com/nickshirleyy/status/2004642794862961123 https://www.startribune.com/prosecutors-charge-5-people-in-a-minnesota-housing-fraud-scheme/601548944 https://www.nytimes.com/2025/11/29/us/fraud-minnesota-somali.html https://www.fox9.com/news/fraud-minnesota-detailing-nearly-1-billion-schemes https://x.com/EricLDaugh/status/2005410646603473256 https://x.com/kevinkileyca/status/2006053056660541840 https://x.com/chamath/status/2006087862492582084 https://x.com/C_3C_3/status/2005722313795440956 https://x.com/OliLondonTV/status/2005988021946999166 https://x.com/tomhennessey69/status/2005556784228909441 https://x.com/WallStreetApes/status/2005849513676923358 https://x.com/MarioNawfal/status/2005179409465299219 https://dcyf.mn.gov/programs-directory/child-care-assistance-program https://x.com/susancrabtree/status/2006079778873565541 https://x.com/chamath/status/2005386348169953607 https://x.com/aaronburnett/status/2003874734661161064 https://newsletter.amuseonx.com/p/the-somali-patronage-system-has-taken https://x.com/realdailywire/status/2006122428196442388 https://x.com/rightanglenews/status/2006375449404866720 https://www.auditor.ca.gov/reports/2025-601/

#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins

From Lex Fridman Podcast

Joel David Hamkins is a mathematician and philosopher specializing in set theory, the foundations of mathematics, and the nature of infinity, and he’s the #1 highest-rated user on MathOverflow. He is also the author of several books, including Proof and the Art of Mathematics and Lectures on the Philosophy of Mathematics. And he has a great blog called Infinitely More. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep488-sc See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript: https://lexfridman.com/joel-david-hamkins-transcript CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: Joel’s X: https://x.com/JDHamkins Joel’s Website: https://jdh.hamkins.org Joel’s Substack: https://www.infinitelymore.xyz Joel’s MathOverflow: https://mathoverflow.net/users/1946/joel-david-hamkins Joel’s Papers: https://jdh.hamkins.org/publications Joel’s Books: Lectures on the Philosophy of Mathematics: https://amzn.to/3MThaAt Proof and the Art of Mathematics: https://amzn.to/3YACc9A SPONSORS: To support this podcast, check out our sponsors & get discounts: Perplexity: AI-powered answer engine. Go to https://www.perplexity.ai/ Fin: AI agent for customer service. Go to https://fin.ai/lex Miro: Online collaborative whiteboard platform. Go to https://miro.com/ CodeRabbit: AI-powered code reviews. Go to https://coderabbit.ai/lex Chevron: Reliable energy for data centers. Go to https://chevron.com/power Shopify: Sell stuff online. Go to https://shopify.com/lex LMNT: Zero-sugar electrolyte drink mix. Go to https://drinkLMNT.com/lex MasterClass: Online classes from world-class experts. Go to https://masterclass.com/lexpod OUTLINE: (00:00) – Introduction (01:58) – Sponsors, Comments, and Reflections (15:40) – Infinity & paradoxes (1:02:50) – Russell’s paradox (1:15:57) – Gödel’s incompleteness theorems (1:33:28) – Truth vs proof (1:44:52) – The Halting Problem (2:00:45) – Does infinity exist? (2:18:19) – MathOverflow (2:22:12) – The Continuum Hypothesis (2:31:58) – Hardest problems in mathematics (2:41:25) – Mathematical multiverse (3:00:18) – Surreal numbers (3:10:55) – Conway’s Game of Life (3:13:11) – Computability theory (3:23:04) – P vs NP (3:26:21) – Greatest mathematicians in history (3:40:05) – Infinite chess (3:58:24) – Most beautiful idea in mathematics

Original title: #488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins

Original description: <p>Joel David Hamkins is a mathematician and philosopher specializing in set theory, the foundation…

[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang

From Latent Space: The AI Engineer Podcast

From creating SWE-bench in a Princeton basement to shipping CodeClash, SWE-bench Multimodal, and SWE-bench Multilingual, John Yang has spent the last year and a half watching his benchmark become the de facto standard for evaluating AI coding agents—trusted by Cognition (Devin), OpenAI, Anthropic, and every major lab racing to solve software engineering at scale. We caught up with John live at NeurIPS 2025 to dig into the state of code evals heading into 2026: why SWE-bench went from ignored (October 2023) to the industry standard after Devin's launch (and how Walden emailed him two weeks before the big reveal), how the benchmark evolved from Django-heavy to nine languages across 40 repos (JavaScript, Rust, Java, C, Ruby), why unit tests as verification are limiting and long-running agent tournaments might be the future (CodeClash: agents maintain codebases, compete in arenas, and iterate over multiple rounds), the proliferation of SWE-bench variants (SWE-bench Pro, SWE-bench Live, SWE-Efficiency, AlgoTune, SciCode) and how benchmark authors are now justifying their splits with curation techniques instead of just "more repos," why Tau-bench's "impossible tasks" controversy is actually a feature not a bug (intentionally including impossible tasks flags cheating), the tension between long autonomy (5-hour runs) vs. interactivity (Cognition's emphasis on fast back-and-forth), how Terminal-bench unlocked creativity by letting PhD students and non-coders design environments beyond GitHub issues and PRs, the academic data problem (companies like Cognition and Cursor have rich user interaction data, academics need user simulators or compelling products like LMArena to get similar signal), and his vision for CodeClash as a testbed for human-AI collaboration—freeze model capability, vary the collaboration setup (solo agent, multi-agent, human+agent), and measure how interaction patterns change as models climb the ladder from code completion to full codebase reasoning. We discuss: John's path: Princeton → SWE-bench (October 2023) → Stanford PhD with Diyi Yang and the Iris Group, focusing on code evals, human-AI collaboration, and long-running agent benchmarks The SWE-bench origin story: released October 2023, mostly ignored until Cognition's Devin launch kicked off the arms race (Walden emailed John two weeks before: "we have a good number") SWE-bench Verified: the curated, high-quality split that became the standard for serious evals SWE-bench Multimodal and Multilingual: nine languages (JavaScript, Rust, Java, C, Ruby) across 40 repos, moving beyond the Django-heavy original distribution The SWE-bench Pro controversy: independent authors used the "SWE-bench" name without John's blessing, but he's okay with it ("congrats to them, it's a great benchmark") CodeClash: John's new benchmark for long-horizon development—agents maintain their own codebases, edit and improve them each round, then compete in arenas (programming games like Halite, economic tasks like GDP optimization) SWE-Efficiency (Jeffrey Maugh, John's high school classmate): optimize code for speed without changing behavior (parallelization, SIMD operations) AlgoTune, SciCode, Terminal-bench, Tau-bench, SecBench, SRE-bench: the Cambrian explosion of code evals, each diving into different domains (security, SRE, science, user simulation) The Tau-bench "impossible tasks" debate: some tasks are underspecified or impossible, but John thinks that's actually a feature (flags cheating if you score above 75%) Cognition's research focus: codebase understanding (retrieval++), helping humans understand their own codebases, and automatic context engineering for LLMs (research sub-agents) The vision: CodeClash as a testbed for human-AI collaboration—vary the setup (solo agent, multi-agent, human+agent), freeze model capability, and measure how interaction changes as models improve — John Yang SWE-bench: https://www.swebench.com X: https://x.com/jyangballin Chapters 00:00:00 Introduction: John Yang on SWE-bench and Code Evaluations 00:00:31 SWE-bench Origins and Devon's Impact on the Coding Agent Arms Race 00:01:09 SWE-bench Ecosystem: Verified, Pro, Multimodal, and Multilingual Variants 00:02:17 Moving Beyond Django: Diversifying Code Evaluation Repositories 00:03:08 Code Clash: Long-Horizon Development Through Programming Tournaments 00:04:41 From Halite to Economic Value: Designing Competitive Coding Arenas 00:06:04 Ofir's Lab: SWE-ficiency, AlgoTune, and SciCode for Scientific Computing 00:07:52 The Benchmark Landscape: TAU-bench, Terminal-bench, and User Simulation 00:09:20 The Impossible Task Debate: Refusals, Ambiguity, and Benchmark Integrity 00:12:32 The Future of Code Evals: Long Autonomy vs Human-AI Collaboration 00:14:37 Call to Action: User Interaction Data and Codebase Understanding Research

Page 5 of 737 (14736 episodes from United States)

🇺🇸 About United States Episodes

Explore the diverse voices and perspectives from podcast creators in United States. Each episode offers unique insights into the culture, language, and stories from this region.