-
129. An IPO interview with the world's first large model stock, chatting with Zhipu CEO Zhang Peng: Where does the path lie?From 🇨🇳 张小珺Jùn|商业访谈录, published at 2026-01-08 01:09
On the eve of the IPO, Zhipu CEO Zhang Peng broke his right leg during a business trip. **When he arrived at the interview location, he was on crutches.**At the time, Zhipu's IPO date was not yet clear. The competition with MiniMax for the title of 'China's first large model stock' was still ongoing, with no clear outcome. Zhang Peng mentioned a Western idiom: **Break a leg, which usually means good luck.**A few days later, the results were revealed.Zhipu confirmed its listing on the Hong Kong Stock Exchange on January 8, 2026, becoming China's first listed large model company, **which also means it will be the 'world's first large model stock'.**If, a hundred years from now, Zhipu appears in the history books of artificial intelligence, how do you hope it will be written about?"A pioneer of AGI." Zhang Peng thought for a moment and said, **"A trailblazer."**This is the first episode of 'Zhang Xiaojun's Business Interviews' meeting everyone in 2026 – Happy New Year everyone! We look forward to progressing together with AI in 2026!02:22 The First to Eat Crabs (Pioneer)20:29 Exploring from Perceptual Intelligence to Cognitive Intelligence29:25 GPT-3 is here!40:46 ChatGPT is back: An Anxious Yet Exciting 202301:09:06 2024: The New Protagonists01:17:58 Paradigm Evolution of Scaling Law01:29:56 2025: Learning from DeepSeek01:38:40 Open Source vs. Closed Source01:48:33 IPO: The World's First Large Model Stock02:01:40 The Trailblazer[More Information]This episode is jointly presented by Language is World Studio and Weibo Finance.Disclaimer: This content does not constitute investment advice.
Original title: 129. 全球大模型第一股的上市访谈,和智谱CEO张鹏聊:敢问路在何方?
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>上市前夕,智谱CEO张鹏在一次出差途中摔断伤了右腿。<strong>到达访谈地点时,他拄着一副拐杖。</strong></p><p>当时,智谱的上市日期尚未明朗。它与MiniMax之间围绕“中国大模型第一股”的竞争仍在继续,结果并未尘埃落定。张鹏提到一句西方俚语:<strong>Break a leg(摔断一条腿),它通常意味着祝你好运。</strong></p><p>几天之后,结果揭晓。</p><p>智谱确认于2026年1月8日登陆港交所,成为中国首家上市的大模型公司,<strong>这也意味着它将是“全球大模型第一股”。</strong></p><p>在这个意味深长的时刻,我与张鹏进行了一场3小时长谈。</p><p>如果一百年后,智谱出现在人工智能的历史书中,你希望它以怎样的方式被写下?</p><p>“AGI的先行者。”张鹏想了想,说,<strong>“一个开路的人。”</strong></p><p>这是《张小珺商业访谈录》与大家在2026年见面的第一集节目——祝大家新年快乐!期待在2026年,我们与AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FsDgR_9P3S2gds74aRINaowopRVi.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>02:22 吃螃蟹的人</p><p>20:29 从感知智能到认知智能的摸索</p><p>29:25 GPT-3来了!</p><p>40:46 ChatGPT又来了:既焦虑又兴奋的2023年</p><p>01:09:06 2024年:新的主角</p><p>01:17:58 Scaling Law的范式演变</p><p>01:29:56 2025年:向DeepSeek学习</p><p>01:38:40 开源 vs 闭源</p><p>01:48:33 IPO:全球大模型第一股</p><p>02:01:40 开路的人</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><figure><img src="https://image.xyzcdn.net/FirtILBnNOPkNKOx28EeqzXPjEcv.png" /></figure><p>【更多信息】</p><p>本集由语言即世界工作室与微博财经联合呈现。</p><p>免责声明:本内容不作为投资建议。</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
128. Manus's final interview before deciding to sell: Ah, this fantastical 2025 drift...From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-12-30 02:41
Today's episode is very special.Our recording date is December 1, 2025, and the guest is Ji Yichao (Peak), co-founder and chief scientist of Manus. **Just in the early hours of this morning, Meta announced its full acquisition of Manus.** However, at the time of recording, the acquisition had not yet taken place.**Ultimately, this episode became Manus's final interview.**00:56 The Wild Adventures of an Upright Youth30:27 Oh, we collectively made a wrong decision!01:07:10 Manus: From 0 to $100 Million ARR02:32:10 Artificial Intelligence is More Like Manufacturing02:59:51 I'm Afraid Manus Will Become Complicated《95. 3-hour Interview with Manus Founder Xiao Hong: The World is Not a Linear Extrapolation; Be an Important Variable in the Game》
Original title: 128. Manus决定出售前最后的访谈:啊,这奇幻的2025年漂流啊…
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天这集节目很特殊。</p><p>我们的录制时间是2025年12月1日,嘉宾是Manus联合创始人兼首席科学家季逸超(Peak)。<strong>就在刚过去的凌晨,Meta宣布全资收购Manus。</strong>而在节目录制的彼时,收购事件尚未发生。</p><p><strong>最终,这期节目成为了Manus最后的访谈。</strong></p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FsSQEdu7GJeGiACzskebSmM1TUHi.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>00:56 正道少年的荒蛮历险记</p><p>30:27 哦,我们集体做了一个错误决定!</p><p>01:07:10 Manus:从0到1亿美金ARR</p><p>02:32:10 人工智能更像制造业</p><p>02:59:51 我很害怕Manus变得复杂</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p><a href="https://www.xiaoyuzhoufm.com/episodes/67c3d80fb0167b8db9e3ec0f">《95. 对Manus创始人肖弘的3小时访谈:世界不是线性外推,做博弈中的重要变量》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
127. Large Model Quarterly Report Year-End Dialogue: With Guang Mi, predicting an AI War, two major interest alliances, and the third paradigm: Online LearningFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-12-24 23:00
This is the 8th episode of "Global Large Model Quarterly Report," and it's also the third year of my New Year's Eve conversation with Guangmi!This episode will show you, at a time when AGI is increasingly filled with a return to realistic sentiment, what teams, factions, and alliances have formed in the global AI War? What new paradigms are various frontier labs exploring? And what new types of research labs have emerged in Silicon Valley?If you have more thoughts or suggestions for the Global Large Model Quarterly Report, please feel free to leave a comment, and we will see them all.By the end of 2025, we look forward to making progress together with AI!AI War: A Competition Global Giants Cannot Afford to Lose02:00 The Global Large Model Quarterly Report has accompanied everyone to its 8th episode03:19 Let's start by talking about the AI Bubble, as is customary07:38 OpenAI's Revenue Composition Analysis: Visible and Invisible Income13:10 Some companies have "pawn value for giants"13:32 The issue of OpenAI's commercialization speed15:04 Looking at the big picture, the main drivers and factions in this AI War: Nvidia GPU vs. Google TPU17:16 The stronger Google gets, the more anti-Google alliances will form; the stronger OpenAI gets, the more anti-OpenAI alliances will formAlternating Leadership is the New Normal for Top Models17:48 The world's three leading models GPT/Claude/Gemini, alternating leadership is the norm in competition25:40 Here's a lazy judgment: foundational models = comprehensive e-commerce, scale SKU = scale data27:40 With Gemini's rise, what will OpenAI do, people wonder? How to view the competition between these two?31:20 Another judgment is: ultimately, ChatGPT will integrate with traditional Search and eventually capture a share of traditional Search advertising.35:08 People no longer consider Google an AI loser like Nokia, but Google's crisis has not truly been resolved.The Third Paradigm After Pre-training and RL: Online Learning36:01 Pre-training scaling is indeed nearing its end, but Online learning is just beginning38:49 OpenAI remains very strong even after splitting 3-4 times: Anthropic was OpenAI's earliest Scaling team, Ilya was the Pre-training team, Thinking Machines was the original ChatGPT and Post-training team40:01 Here's a bold claim: many of the robotics, world models, and multimodal issues people raise might be false problems; Online learning might be the only truly important problem.41:01 Pre-training is oil, fossil fuel; RL expert data is new energy, useful but limited in total; Online Learning is nuclear fusion, not yet broken through, but if it breaks through, it will be invincible, and humanity will enter the silicon-based era.Is AGI like a Marathon or Autonomous Driving? A War of Attrition + Cash Flow Battle43:05 If a model's data distribution doesn't contain such data, these tasks won't work; only by compressing such data will they work – today's models are still huge compressors.44:33 "Model is Product, Data is Model"44:45 Heard a rumor: Sam said internally to forget about AGI for now?45:04 Local L3/L4, difficult for overall L4: More realistically, among knowledge workers, the experience of local L3/L4 is observable, such as ChatGPT for long-tail information retrieval, Coding Agent, Office/PPT/Excel Agent, Finance Investment Research Agent.Current Thoughts on Investment (Not Investment Advice)47:11 Last podcast mentioned 40% OpenAI + 40% Bytedance + 10% Google + 10% Anthropic. Now it's: 25% OpenAI + 25% Bytedance + 10% Google + 10% Anthropic + 10% Nvidia + 10% TSMC, a bit for each. Also, today we should bet on the paradigm and winner three years from now; Neo Labs like Thinking Machines and SSI should also be seriously considered.2026: Important Trends and Signals in the Bay Area50:57 Investment themes to look forward to in 202652:53 Model is Product, Data is Model54:48 Horizontal and Vertical: Horizontally distill human expert knowledge, horizontally expand into more industry domains; Vertically means the next generation technology paradigm, Online learning, creating higher economic value.56:45 Distribution map of newly emerging Neo Labs in Silicon Valley59:43 Latest developments and company distribution in Robotics01:05:55 ARR growth status of top Silicon Valley companies: The more prominent the company, the cheaper it is; the more prominent the company, the less of a bubble it has.01:08:02 Domestic large model and application companies01:09:39 What's the next decisive move for models?Chinese Entrepreneurs, Funds, and "China's Silicon Valley"01:10:16 Differences in AI narratives between China and the US01:12:15 What to say to Chinese entrepreneurs01:14:20 Why do we say we hope to promote a Silicon Valley in China?01:16:45 Will the world's leading AI companies in 3-5 years be Chinese teams?Year-End Dialogue [Standing Beyond 2025]《122. Zhu Xiaohu's Third Installment of Realist Stories: The Feast and Bubble of Artificial Intelligence》《124. Chatting with Dai Yusen about 2026 Expectations, The Year of R, Pullbacks, and How We Bet》《125. Chatting with Altimeter Partner Freda: Betting on OpenAI, Robinhood's Past, America's Capital Bad Boy, Abacus and Bubbles》《126. Chatting with Sequoia's Zheng Qingsheng: The Traffic Revolution in Economic History, the Unpredictability of Human Behavior Patterns, and Founder Personalities》【More Information】Disclaimer: This content is not intended as investment advice.
Original title: 127. 大模型季报跨年对谈:和广密预言一场AI War、两大联盟和第三个范式Online Learning
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>这里是《全球大模型季报》的第8集,也是我和广密跨年对谈的第三年了!</p><p>这集节目将带你看到,在对AGI开始充斥着现实主义情绪回归的当下,<strong>全球AI War形成了怎样的战队、阵营和联盟?各个前沿实验室在探索哪些新范式?硅谷又涌现出了哪些新型的研究实验室?</strong></p><p>如果你对全球大模型季报有更多的想法或者建议,欢迎大家在评论区留言,我们都会看到。</p><p>2025的最后,期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FgSVniff58-vFXBYZhL7N6PMYMsi.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p><strong>AI War:一场全球巨头都输不起的竞争</strong></p></blockquote><p>02:00 全球大模型季报陪伴大家到第8集了</p><p>03:19 一开始就不免俗地聊聊AI Bubble吧</p><p>07:38 OpenAI收入构成算账:看得清的收入和看不清的收入</p><p>13:10 有的公司是“巨头的棋子价值”</p><p>13:32 OpenAI做商业化的速度问题</p><p>15:04 纵观全局,这场AI War的主要推动方和阵营:英伟达GPU vs 谷歌TPU</p><p>17:16 Google越强,越会形成反Google联盟,OpenAI越强也会形成反OpenAI联盟</p><blockquote><p><strong>交替领先是顶尖模型的新常态</strong></p></blockquote><p>17:48 全球最领先的3个模型GPT/Claude/Gemini,交替领先是竞争常态</p><p>25:40 这里有个偷懒的判断,基础模型=综合电商,scale SKU=scale data</p><p>27:40 Gemini崛起,大家会担心OpenAI会怎么办?怎么看待这两家的竞争?</p><p>31:20 另一个判断是:最终的最终,ChatGPT会融合传统Search,最终也会吃掉传统Search广告的份额</p><p>35:08 大家不把Google当做AI loser诺基亚了,但Google危机没有真正解除</p><blockquote><p><strong>Pre-training和RL之后的第三个范式:Online learning</strong></p></blockquote><p>36:01 Pre-training scaling确实快结束了,但Online learning刚开始</p><p>38:49 OpenAI即便分崩离析3-4次了也依然很强:Anthropic是OpenAI最早的Scaling team,Ilya是Pre-training team,Thinking Machines是原班ChatGPT和Post-training team</p><p>40:01 说一个暴论:大家提的机器人、世界模型、多模态,很多可能是假问题,Online learning可能才是唯一重要的真问题</p><p>41:01 Pre-training预训练是石油,化石燃料;RL专家数据是新能源,有用但总量少;Online Learning是核聚变,还没突破,突破了无敌,人类进入硅基时代</p><blockquote><p><strong>AGI像马拉松 or 自动驾驶?持久战+现金流之战</strong></p></blockquote><p>43:05 如果模型数据分布里面没有这类数据,这类任务就是不work,只有压缩过这类数据,才work——今天的模型还是巨大的压缩器</p><p>44:33 “模型即产品,数据即模型”</p><p>44:45 听过一个rumor:Sam在内部说先忘掉AGI?</p><p>45:04 局部L3/L4,很难整体L4:现实一点的是,在知识工作者群体,局部L3/L4的体验是能看到的,比如ChatGPT做长尾信息获取, Coding Agent, Office/PPT/Excel Agent, Finance金融投研Agent</p><blockquote><p><strong>现阶段对于投资的思考(不作为投资建议)</strong></p></blockquote><p>47:11 上一期播客说的是40%OpenAI+40%字节+10%Google+10%Anthropic</p><p>现在是:25%OpenAI+25%Bytedance+10%Google+10%Anthropic+10%Nvidia+10%TSMC,每家都放一点</p><p>另外是今天要bet 3年后的范式和winner了,Thinking Machines和SSI这种Neo Labs也应该好好考虑下</p><blockquote><p><strong>2026年,湾区的重要趋势和信号</strong></p></blockquote><p>50:57 2026年期待投资的主题</p><p>52:53 模型即产品,数据即模型</p><p>54:48 一横一纵:横向蒸馏人类专家知识,横向扩宽更多的行业领域;纵向就是下一代技术范式,Online learning,创造更高的经济价值</p><p>56:45 硅谷新冒出的Neo Labs的分布图</p><p>59:43 Robotics的最新进展和公司分布</p><p>01:05:55 硅谷头部公司的ARR增长状况:越头部的公司越便宜,越头部的公司越没有Bubble</p><figure><img src="https://image.xyzcdn.net/Fhvhd5hEGDhmdy1zwZ9mDJ8MCV01.png" /></figure><p>01:08:02 国内的大模型和应用公司</p><p>01:09:39 模型的下一个胜负手是什么?</p><blockquote><p><strong>华人创业者、基金和“中国的硅谷”</strong></p></blockquote><p>01:10:16 中美的AI叙事差异</p><p>01:12:15 对华人创业者想说啥</p><p>01:14:20 为什么说希望推动中国有个硅谷?</p><p>01:16:45 3-5年之后全球最领先的AI公司会是华人团队吗?</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>年终对话【站在2025年之外】</p><figure><img src="https://image.xyzcdn.net/lhLVtttWwcKAdf0OIokZ5NqvC1yc.png" /></figure><p><a href="https://www.xiaoyuzhoufm.com/episodes/693834013fec3166cf262bd0" rel="noopener noreferrer nofollow" target="_blank">《122. 朱啸虎现实主义故事的第三次连载:人工智能的盛筵与泡泡》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/693d7c172a383da167ecfcde" rel="noopener noreferrer nofollow" target="_blank">《124. 和戴雨森聊2026年预期、The Year of R、回调、我们如何下注》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/694180874c65abaff3576bc4" rel="noopener noreferrer nofollow" target="_blank">《125. 与Altimeter合伙人Freda聊:下注OpenAI、Robinhood往事,美国资本坏小孩、算盘与泡沫》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/694795da9f70e5d6b371d207" rel="noopener noreferrer nofollow" target="_blank">《126. 和红杉郑庆生聊:经济史的流量革命、人类行为模式的不可预期,与创始人性格》</a></p><p>【更多信息】</p><p>免责声明:本内容不作为投资建议。</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
126. Talking with Sequoia's Zheng Qingsheng about: The flow revolution in economic history, the unpredictability of human behavior patterns, and founder personalityFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-12-21 23:00
In previous episodes of our year-end review series, "Standing Beyond 2025," we featured the voices of Jinshajiang's Zhu Xiaohu, ZhenFund's Dai Yusen, and Altimeter Capital's Freda Duan. Today, we continue this series with Zheng Qingsheng, a partner at Sequoia China.Zheng Qingsheng offers a grander perspective – he extends the time horizon to 20 years, looking back at China's three waves of technological change, from the internet and mobile internet to AI. Furthermore, he places this perspective within a broader economic history, using the "traffic revolution" to seek clues for the next generation of to-C traffic nodes. Coincidentally, he entered the venture capital industry in 2005, making this his 20th year as an investor. Beyond his primary role as an investor, he is also a new product experience officer and an economic history enthusiast.Guest Profile02:00 Starting with learning programming in 198417:17 After becoming an investor in 2005: From Shanda Capital to Trust Bridge Partners and then Sequoia CapitalNew human behavior patterns are unpredictable19:09 The impact of Douban and Dianping on my investment career: I was deeply interested in them, as they explored cutting-edge human behavior patterns22:19 My impression of Ah Bei (Douban founder): "He and Douban are one"23:21 My impression of Zhang Tao (Dianping founder): "A more mature entrepreneur with keen product insights"23:55 Looking back now, Web 2.0 represented a wave of online content product innovation after humanity's first digitalization. Later, the sharing economy brought another large-scale innovation combining online and offline elements.24:31 China's venture capital over 20 years through the eyes of an economic history enthusiast: Before '05, after '10, after '15 (Pinterest's paradigm directly influenced subsequent content platforms)29:13 Looking back in history's rearview mirror: the birth, prosperity, and decline of various content platforms"Text is a high-level form of knowledge product""Rich media (images and text) tends to supersede text""Short video is a fundamental way for humans to understand the world""Ultimately, short video challenges text itself"34:10 Humans naturally evolve towards ways of understanding the world that don't require learning or long-term investment. Is AI also returning to this point?35:06 A personal perspective on Douyin, Xiaohongshu, Bilibili products and their founders"Bilibili's founder is more integral to their product, similar to Ah Bei""Xiaohongshu has the most open product structure I've ever seen"39:56 Summary: "New human behavior patterns are generally unpredictable"42:26 My personal investment aesthetics and reflections48:23 Why do apps like Xiaoyuzhou or podcasts emerge when we feel that C-side traffic has been fully captured?"Hearing is the only sense that can be used for multi-tasking"Traffic Revolution in Economic History50:51 Mobile internet C-side traffic culminated in short video; for many years after 2018 and 2019, there was a lack of major innovation, and to-C investment entered a dormant period.53:27 Traffic is a fulcrum in human economic history: Highways > Railways > Canals > Electricity > Wired Telephones > Television > Internet57:21 "You can think of all excellent internet to-C products today as a giant town"57:47 Artificial intelligence shows us the potential for new to-C traffic entry points.01:00:19 Differences: The networks formed in the AI era are not natural monopolies; their marginal costs do not approach zero; they are more results-oriented.01:04:34 AI has triggered deep digitalization, which I believe will bring new hardware opportunities. This could be another new traffic node beyond large models.01:09:53 Why haven't AI era products formed a two-sided network effect?01:12:20 The commercialization of AI products performs better than that of internet and mobile internet products.01:13:00 Having invested in Kimi, MiniMax, and Manus, do you think the ultimate value will reside in model companies or application companies?AI has bubbles? Just like the ocean has bubbles.01:18:24 Sequoia's systematic investment strategy in the AI era01:19:01 Sequoia's evolving aesthetic for founders01:19:45 I believe "track coverage" is a misunderstanding of Sequoia01:22:10 Agent startups vs App startups: Now it's born global01:23:50 Changes and pace in AI startups over the past three years01:24:40 Outlook and expectations for 202601:26:21 AI Bubble: "It's just like how there are bubbles in the ocean"01:28:39 Witnessing three traffic revolutions in human historyImagined Communities, Abstract Life, and Personified Representatives01:29:17 Observations on entrepreneurs from 0 to 1, 1 to 10, 10 to 100, and failed entrepreneurs"The CEO must become the personified symbol of the organization and its systems""Even if you can't achieve it, you must play the part""It's rare to have both talents: product sensitivity and embodying the persona of the organization and its team"01:32:05 CEO and MBTI01:35:20 Final rapid-fire Q&A;Year-end dialogue "Standing Beyond 2025":《122. Zhu Xiaohu's Third Installment of Realist Stories: The AI Feast and Bubble》《124. A Chat with Dai Yusen on 2026 Expectations, The Year of R, Correction, and How We Bet》《125. A Chat with Altimeter Partner Freda: Betting on OpenAI, Robinhood's Past, America's Capital Bad Boy, Abacus, and Bubbles》【More Information】Disclaimer: This content is not investment advice.
Original title: 126. 和红杉郑庆生聊:经济史的流量革命、人类行为模式的不可预期,与创始人性格
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>在前几集节目,我们的年终回顾系列【站在2025年之外】,收录了金沙江朱啸虎、真格戴雨森、Altimeter Capital Freda Duan的声音。</p><p>今天我们将延续这个系列节目,嘉宾是红杉中国合伙人郑庆生。</p><p><strong>郑庆生提供了一个更宏大的视角——他把时间尺度拉长到20年,回看中国从互联网、移动互联网到AI的三轮技术浪潮;又进一步,把视角放进了更宏观的经济史中,用“流量革命”来试图寻迹下一代to C流量节点的端倪。</strong></p><p>很巧的是,他于2005年入行风险投资业,今年是他做投资人的20年。</p><p>在投资人的本职工作外,他也是一名新产品体验官、一位经济史爱好者。</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FjTDsaGBV5M7mBWnyT5FwXTER80g.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p><strong>嘉宾小传</strong></p></blockquote><p>02:00 从1984年开始学习编程讲起</p><p>17:17 2005年成为投资人以后:从盛大战投到挚信资本再到红杉资本</p><blockquote><p><strong>人类新的行为模式是不可预期的</strong></p></blockquote><p>19:09 豆瓣和大众点评对我投资生涯的影响:我对此充满了兴趣,是对人类前沿行为模式的探讨</p><p>22:19 我对阿北(豆瓣创始人)的印象:“他和豆瓣是合一的”</p><p>23:21 我对张涛(大众点评创始人)的印象:“更成熟的有敏感产品洞察力的企业家”</p><p>23:55 现在回头看,Web2.0是在人类社会第一次数字化之后做了一波线上内容的产品创新,再往后,共享经济是又做了一次线上、线下结合的大规模创新</p><p>24:31 一位经济史爱好者眼中的中国创投20年:05年以前、10年以后、15年以后(Pinterest的范式直接影响了后来的内容平台)</p><p>29:13 站在历史的后视镜看,各个内容平台的诞生、繁荣、陨落</p><ul><li><p>“文字是高级形态的知识产品”</p></li><li><p>“图文混排倾向于覆盖文字”</p></li><li><p>“短视频是人类认识世界的基础方式”</p></li><li><p>“最终,短视频挑战的是文字本身”</p></li></ul><p>34:10 人类天然会进化到和自己本来不需要学习、不需要长时间成本投入,就能认知这个世界的方式,AI是不是也回到这点?</p><p>35:06 个人视角聊聊抖音、小红书、哔哩哔哩产品和他们的founders</p><ul><li><p>“B站的founder更属于自己的产品,跟阿北一样”</p></li><li><p>“小红书是我到目前见过的最开放的产品结构”</p></li></ul><p>39:56 总结:“人类新的行为模式总体是不可预期的”</p><p>42:26 我对于个人投资的审美和反思</p><p>48:23 为什么当我们觉得C端流量攫取殆尽时,小宇宙或播客会涌现?</p><ul><li><p>“听觉是可以唯一多线程并用的感官”</p></li></ul><blockquote><p><strong>经济史中的流量革命</strong></p></blockquote><p>50:51 移动互联网C端流量终结于短视频,18年、19年以后的很多年都缺乏大的创新,to C投资进入蛰伏期</p><p>53:27 流量是人类经济史的支点:公路〉铁路〉运河〉电力〉有线电话〉电视〉互联网</p><p>57:21 “你可以认为现在所有的优秀互联网to C产品都是一个巨大的城镇”</p><p>57:47 人工智能让我们看到了新的to C流量入口的潜力</p><p>01:00:19 不同点:AI时代所形成的网络不是带有自然垄断性质的网络,它的边际成本不趋近于0;更结果导向</p><p>01:04:34 人工智能引发了深层次的数字化,我认为会带来新的硬件机会,它可能是大模型之外另一个新的流量节点</p><p>01:09:53 为什么AI时代的产品没有形成双边网络效应?</p><p>01:12:20 AI产品的商业化比互联网、移动互联网要做的好</p><p>01:13:00 投资了Kimi、MiniMax,也投资了Manus,你觉得最终的价值会沉淀在模型公司还是应用公司?</p><blockquote><p><strong>AI有泡沫?就跟大海里有泡沫一样</strong></p></blockquote><p>01:18:24 AI时代,红杉的系统性投资策略</p><p>01:19:01 红杉对创始人的审美变化</p><p>01:19:45 我觉得“赛道覆盖”是对红杉的误解</p><p>01:22:10 Agent创业 vs App创业:现在是天生全球</p><p>01:23:50 过去三年在AI创业端的变化和节奏</p><p>01:24:40 对2026年的展望与预期</p><p>01:26:21 AI Bubble:“就跟大海里有泡沫一样”</p><p>01:28:39 见证了人类历史的三个流量革命</p><blockquote><p><strong>想象的共同体、抽象的生命和人格化代表</strong></p></blockquote><p>01:29:17 对从0到1、从1到10、从10到100和失败的创业者观察</p><ul><li><p>“CEO要成为组织和制度人格化的象征”</p></li><li><p>“哪怕你做不到都得扮演”</p></li><li><p>“同时有两种天赋是很难得的,又有产品的敏感力, 又扮演组织和部队的人格”</p></li></ul><p>01:32:05 CEO和MBTI</p><p>01:35:20 最后的快问快答</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>年终对话【站在2025年之外】:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/693834013fec3166cf262bd0" rel="noopener noreferrer nofollow" target="_blank">《122. 朱啸虎现实主义故事的第三次连载:人工智能的盛筵与泡泡》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/693d7c172a383da167ecfcde" rel="noopener noreferrer nofollow" target="_blank">《124. 和戴雨森聊2026年预期、The Year of R、回调、我们如何下注》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/694180874c65abaff3576bc4" rel="noopener noreferrer nofollow" target="_blank">《125. 与Altimeter合伙人Freda聊:下注OpenAI、Robinhood往事,美国资本坏小孩、算盘与泡沫》</a></p><p>【更多信息】</p><p>免责声明:本内容不作为投资建议。</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
125. Chatting with Altimeter Partner Freda: Betting on OpenAI, Robinhood's Past, America's 'Bad Boy' Capital, Abacus and BubblesFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-12-16 23:00
In the first two episodes of the year-end dialogue series "Beyond 2025":Zhu Xiaohu proposed that "there will be no bubble in three years," and "talk of a bubble is pure nonsense";Dai Yuisen, on the other hand, predicted that 2026 would be the "Year of R," a year of return to reality.Today's release is the third episode of the series, featuring a guest with a frontline Silicon Valley perspective.A little over a month ago, in early November 2025, Sam Altman appeared on a podcast hosted by the founder of the American fund Altimeter Capital. When the host repeatedly pressed him on how OpenAI would pay for its $1.4 trillion commitment to computing power and infrastructure, Sam stated: "If you want to sell your shares, I’ll find you a buyer. Enough." (meaning, "If you want to sell your shares, I can find you a buyer. That's enough.") — Subsequently, the AI sector as a whole experienced fluctuations, and discussions about whether AI is in a bubble further intensified.Our guest today, Freda Duan, is from this fund called Altimeter Capital, where she serves as a partner.Altimeter is a Silicon Valley tech fund that invests across primary and secondary markets. Its primary market investments include OpenAI, Anthropic, ByteDance, etc., and its secondary market investments include NVIDIA, Snowflake, Robinhood, etc.In this episode, Freda will deeply analyze these star American companies, calculating the returns on their massive investments; she will also discuss, from the perspective of a frontline Silicon Valley investor, the new order of American capital, their views on "bad kids," rebels, "hedgehog-type" and "Nezha-type" founders, and the concept of a bubble.In 2025, let's advance together with AI!(Recorded in November 2025)03:30 Freda's self-introduction04:41 Silicon Valley keywords for each year from 2020-202508:12 Three main themes in US stock investment today: AI + Re-industrialization + Digitization of Finance (financial industry innovation). These three themes are very interesting because there are many connections between them.10:20 How do American investors view the Chinese market?10:59 Investing in OpenAI12:14 Calculating the business model for OpenAI (comparing with Netflix)16:45 OpenAI's four revenue pillars20:49 OpenAI's competition23:32 Changes at Google26:27 OpenAI's investment returns and IPO28:25 Investing in Anthropic31:25 Neo labs32:31 Investing in Robinhood40:29 Does Silicon Valley capital prefer "good kids" or "bad kids"?44:26 Discovering new species (market prediction)46:07 Autonomous driving and robotics55:25 "Primary market relies on consensus, secondary market relies on non-consensus"57:13 Different US funds' tastes in people: hedgehog-type, rebel, Nezha-type founders58:22 Overall changes in US funds: more concentrated positions and heavy bets01:03:43 Reviewing Silicon Valley's key directions for 202501:09:51 Whose pockets are these massive AI company revenues coming from?01:14:11 Input-output ratio of massive AI investments01:15:04 Are we in an AI bubble?01:16:31 Looking ahead to 2026Year-end Dialogue "Beyond 2025":"122. Zhu Xiaohu's Third Installment of Realist Stories: The AI Feast and Bubble""124. A Chat with Dai Yuisen on 2026 Expectations, The Year of R, Correction, and How We Bet"[More Information]Disclaimer: This content is not investment advice.
Original title: 125. 与Altimeter合伙人Freda聊:下注OpenAI、Robinhood往事,美国资本坏小孩、算盘与泡沫
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>在年终对话系列【站在2025年之外】的前两集节目中:</p><p>朱啸虎提出“三年不会有泡沫”,“泡沫论调纯属无稽之谈”;</p><p>戴雨森则预测,2026年是“Year of R”,将会是一个现实回归之年。</p><p><strong>今天推出的是系列第三集节目,嘉宾来自一线的硅谷视角。</strong></p><p>1个多月前,在2025年11月初,Sam Altman上了一档由美国基金Altimeter Capital创始人主持的播客节目,在主持人连续追问OpenAI如何为1.4万亿美元级别算力与基础设施承诺买单时,Sam称:“If you want to sell your shares, I’ll find you a buyer. Enough.”(“如果你想卖掉你的股份,我可以帮你找到买家。够了。”)——随后,AI板块整体出现波动,关于AI是否存在泡沫的讨论进一步升温。</p><p>我们今天的嘉宾Freda Duan就来自这个名叫Altimeter Capital的基金,她担任合伙人。</p><p>Altimeter是一个硅谷科技基金,横跨一二级。在一级市场投资案例有OpenAI、Anthropic、字节跳动等,在二级市场投资案例有NVIDIA、Snowflake、Robinhood等。</p><p><strong>这集节目,Freda将深入分析美国这些明星公司,给他们的巨额投入算算账;她也从一线硅谷投资人的视角聊聊,美国资本的新秩序,他们眼中的坏小孩、反叛者、刺猬型和哪吒型创始人,以及泡沫。</strong></p><p>2025年,让我们和AI共同进步!</p><p>(录制于2025年11月)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fg0moXlpj8LxUJ-N-E6roEWpxKWn.png" /></figure><p>03:30 Freda的自我介绍<br />04:41 2020-2025每一年的硅谷关键词<br />08:12 今天美股投资三条主线:AI + Re-industrilization(再工业化) + Digitization of Finance(金融产业创新),三条主线非常有意思,因为中间有很多联系<br />10:20 美国投资人怎么看待中国市场?<br />10:59 投资OpenAI<br />12:14 给OpenAI的商业模式算算账(对比Netflix)<br />16:45 OpenAI的收入四个支柱<br />20:49 OpenAI的竞争<br />23:32 Google的变化<br />26:27 OpenAI的投资回报和IPO<br />28:25 投资Anthropic<br />31:25 Neo labs<br />32:31 投资Robinhood<br />40:29 硅谷资本喜欢乖小孩还是坏小孩?<br />44:26 发现新物种(market prediction)<br />46:07 自动驾驶和机器人<br />55:25 “一级靠共识,二级靠非共识”<br />57:13 美国不同基金看人的taste:刺猬型、反叛者、哪吒型创始人<br />58:22 美国基金整体变化:更集中仓位下重注<br />01:03:43 复盘硅谷2025年最重点方向<br />01:09:51 这些AI公司的巨额收入从谁的口袋里来?<br />01:14:11 巨额AI投资的投入产出比<br />01:15:04 我们在AI bubble中吗?<br />01:16:31 展望2026年</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>年终对话【站在2025年之外】:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/693834013fec3166cf262bd0" rel="noopener noreferrer nofollow" target="_blank">《122. 朱啸虎现实主义故事的第三次连载:人工智能的盛筵与泡泡》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/693d7c172a383da167ecfcde" rel="noopener noreferrer nofollow" target="_blank">《124. 和戴雨森聊2026年预期、The Year of R、回调、我们如何下注》</a></p><p>【更多信息】</p><p>免责声明:本内容不作为投资建议。</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
124. Year-End Review [Standing Outside of 2025] A conversation with Dai Yusen about 2026 Expectations, The Year of R, Correction, How We BetFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-12-13 23:00
Unconsciously, we have arrived at the last month of 2025. Amidst Beijing's first snow, we hope to reflect and look forward with everyone through a series: ["Beyond 2025"]. Today's guest is Dai Yusen, Managing Partner at ZhenFund. In episode 122, **Zhu Xiaohu claimed there would be no bubble within three years, calling the bubble theory pure nonsense, and urged entrepreneurs to advance at full speed in 2026.** **Yusen brings a fresh perspective today. In his view, the keyword for 2026 is "The Year of R"—Return and Research will once again become important. In a sense, 2026 will be a year of reality and recalibration.** **02:00 Review of 2025** 02:00 Progress from the model perspective: Thinking Time Scaling, represented by o1, has led to a significant improvement in model capabilities. OpenAI, Anthropic, and Google's flagship models are closely competing, each with unique characteristics, leading to rotating expectations and narratives. Chinese model companies have dominated the open-source ecosystem over the past year. 28:13 Progress from the application perspective: The first year of Agents means the problem won't be solved in a year; it's still an Early Market, needing to cross the chasm. Agent comes from Agency; the key is autonomy, saving human time, and being able to complete novel tasks and solve unseen problems. 52:31 How many projects did ZhenFund invest in in 2025? Around 20. Comparing the valuations of Chinese and US AI companies, Chinese companies hold high option value for the global market: Thinking Machine's angel round valuation, without a product, already equals the total valuation of Chinese AI companies. Model companies: Mistral 14b, Kimi 4b. Mistral itself doesn't focus much on pre-training anymore; benchmarks are essentially against Kimi. Application companies: In the US, a company like Manus, achieving 100m ARR in a few months, with dozens of percentage points gross margin and 20% MoM growth, should be valued at 3-5bn. **01:03:15 Forecast for 2026: The Year of R** **The Year of R: Return, Research, Remember, Multimodal Reasoning** **01:03:15 Return:** Why is Return important? ROI: The past 3 years have been about investment, as everyone was attracted by potential big returns. But now, as I (investment) gets larger, there's increasing focus on the realization of R (return), because R is what drives future I. Why do we believe there will be increased focus on return in 2026? Models: Model capability improvement is the most essential driving force of this wave of AI revolution, but model capability improvement is slowing down; leading US labs have much higher investments (Capex, labor, etc.), but this cannot prevent Chinese models from low-cost follow-up. Scaling Law cannot be simply understood as "huge investment leads to miracles." Applications: Application narratives have evolved from AGI to three main current lines; dreams are shrinking. Subscription model: It's difficult to raise prices for ordinary users; the low-hanging fruit has been mostly picked, and competition is intensifying. Advertising + E-commerce: Firstly, a large part is existing market share distribution; then, for new forms of applications, the speed is not as fast. Usage models like AI Coding/Image Gen: Usage will increase, but token prices will also decrease. Non-frontier intelligence will quickly become flat rate; only the most SOTA tasks can charge by usage. Previously valuable tasks will become less valuable; replacing many programmers doesn't mean earning those programmers' salaries. Enterprise services: This early market segment is limited in size, with many early adopters but retention may not be great. Microsoft Copilot continues to underperform expectations; large companies don't adopt new technologies that quickly. Conclusion: We need to achieve the accelerated GDP growth Satya mentioned, expanding the pie is true AGI, e.g., AI creating new drugs, discovering new knowledge, etc. Investment: US infrastructure construction is slow now, computing power depreciates fast, and labor costs are high, so returns need to be seen quickly. Market setup: At the end of last year, expectations were not high, but we saw rapid growth in ChatGPT, and certainty in Coding and Agentic brought application opportunities. Now, expectations are very high, investments are huge, but revolutionary new capabilities are not seen on the model side in the short term, and new paradigms are still brewing. Implications for entrepreneurs? The time for negative margin for growth is passing; quality growth is needed. The very loose financing environment (in the US) might slow down. **01:16:13 Research:** **new paradigm:** AI history has always seen step-function improvements; new paradigms are needed to bring about large increases in AI capabilities again. Ilya: scaling and research alternate; now it's time for research again. Currently, Online Learning, world models, etc., are important research directions. **neo labs:** Thinking machines, SSI, Reflection, to recent Humans&, Periodic, Isara, etc. Because engineering and product development are very different from research, a relaxed environment, a culture of free exploration, without time and KPI constraints, is needed. Everyone hopes neo labs can explore new paths that differentiate from current leading model companies. **new benchmark:** Current benchmarks can no longer effectively reflect the differences in AI capabilities, nor are they suitable as model training targets. How to measure a model that exceeds human performance in most areas? Yao Shunyu pointed out that the second half has arrived; new benchmarks are needed. Implications for entrepreneurs: Pay attention to the progress of frontier research; breakthroughs in research may unlock new application opportunities. **01:21:00 Remember (Memory):** Memory is a key differentiator for AI applications; current Memory capabilities have significantly improved ChatGPT's retention. Current Memory is still basically retrieval-based, not achieving true understanding. This area is also a fiercely contested battleground for research; if done well, it will bring further improvements. Proactive Agent: Memory and context are needed to unlock Proactive Agent opportunities, and Proactive Agents are very important because human active use of AI has limited intent; AI actively serving people can create 10x scenario opportunities. **01:24:06 Multimodality:** Visual Reasoning may have major breakthroughs; humans are essentially Pixel Machines, understanding the world through visual input. One can pay attention to the performance improvement of Zerobench, a Visual Reasoning Benchmark; current leading models are still basically below 10 points. Nano Nanana means image generation has entered a usable era like Sonnet 3.5. So what will be the Cursor of Image-gen? GPT-3.5 unlocked ChatGPT, Sonnet 3.5 unlocked Cursor, Sonnet 3.7 unlocked Manus. What application opportunities will Nano Nanana/Veo unlock? Using Imagegen/Videogen inside ChatGPT is clearly not a comfortable experience. Voice is a very important opportunity: better, more natural interaction, understanding user context. Plaud, Granola, Wispr flow/Typeless, Suno? **01:30:29 AI Bubble** From the secondary market perspective, there might be a major correction next year, possibly in the second half. The book "Boom: Bubbles and the End of Stagnation" mentions two types of bubbles: good bubbles and bad bubbles. If a correction is expected, what will be the changes in investment strategy next year? How will the secondary market transmit to the primary market? How to view Zhu Xiaohu's statement: "No bubble for at least three years," "Their arguments are pure nonsense"? "Personally, I am completely out of the market right now." The valuation gap between China and the US is expected to narrow. **01:47:38 Entrepreneurship Changes and Advice** Based on the "Year of R" theory, what advice for entrepreneurs? How to judge founders in the AI era? What's the biggest difference from the internet era? Entrepreneurship is like F1 racing. Any projects missed in the past two years? Which directions have seen incremental growth due to AI? What are good interactions besides chatbots? This year, I personally talked to 150 projects, only invested in 2. **02:18:31 Also Talk About Life** Reflections on personal life: This year's reading, thinking, and life. Reflections on VC: Young investors need to differentiate themselves. Reflections on ordinary people: Learn to live in intelligence abundance. **02:29:50 Final Q&A** **Last question: You proposed the Year of R, and you've cleared your secondary market stocks. So, will you short them?** **02:36:10 At the end of this episode, I've included a casual chat with Yusen before the recording. We commented on some frequently discussed AI companies. If you find it interesting, you can continue listening.** **02:36:30** OpenAI 02:46:38 Google (I don't think Gemini can stop ChatGPT's growth, nor do I think Google is out of danger.) 03:06:36 Anthropic 03:11:05 Manus 03:19:47 Thinking Machines Lab, Safe Superintelligence Inc. Year-end Review ["Beyond 2025"]: 《122. Zhu Xiaohu's Third Installment of Realist Stories: The AI Feast and Bubble》 [More Information] Disclaimer: This content does not constitute investment advice.
Original title: 124. 年终对话【站在2025年之外】和戴雨森聊2026年预期、The Year of R、回调、我们如何下注
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>不知不觉,我们来到了2025年的最后一个月,在北京的初雪之中,我们希望和大家一起做一个回顾与展望系列:【站在2025年之外】。</p><p>今天的嘉宾是真格基金管理合伙人戴雨森。</p><p>在122集节目中,<strong>朱啸虎声称,三年之内不会有泡沫,泡沫论调纯属无稽之谈,创业者2026年当全速前进。</strong></p><p><strong>雨森今天带来全新的看法。在他看来,2026年的关键词是“The Year of R”——回报与研究会再次变得重要。某种意义上,2026年将是一个现实与回调之年。</strong></p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fjs5F3qS2Xi9Vb47C4K5FACQOs0K.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p><strong>02:00 复盘2025年</strong></p></blockquote><p>02:00 从模型侧看进展:</p><p>o1为代表的Thinking Time Scaling带来模型能力大幅提升</p><p>OpenAI、Anthropic、Google三家的旗舰模型追赶很紧,又各有特点,预期和叙事轮动</p><p>中国模型公司一年下来dominate开源生态</p><p>28:13 从应用侧看进展:</p><p>模型能力带来应用大爆发</p><p>应用是有护城河的,开始看到复杂应用在context、environment等层面产生壁垒</p><p>模型公司不能没有产品,大家都下场做最重要的第一方应用</p><p>中国AI应用出海表现不错</p><p>52:31 2025年真格出手了多少项目?20个左右</p><p>对比中美AI公司估值,中国公司对于全球来说有很高期权价值:</p><p>Thinking Machines天使轮估值在没有产品的情况下已是中国AI公司估值总和</p><p>模型公司:Mistral 14b,Kimi 4b,Mistral自己都不怎么做Pre-train了,benchmark也就是和Kimi对标</p><p>应用公司:在美国Manus这样一家几个月做到100m ARR,几十个点gross margin,MoM20%增长的公司应该是3-5bn</p><blockquote><p><strong>01:03:15 预测2026年:The Year of R</strong></p></blockquote><p><strong>The Year of R:Return、Research、Remember、多模态Reasoning</strong></p><p><strong>01:03:15 Return:</strong></p><p>为什么Return很重要?</p><p>ROI,过去3年交易的是investment,因为大家被潜在的大return吸引,但现在随着I越来越大,大家对R的落地越来越关注,因为有R才能推动未来的I</p><p>为什么我们认为2026年大家会加大对return的关注?</p><p>模型:模型能力进步是这一波AI革命最本质的驱动力,但模型的能力进步正在放缓;美国头部labs的投入(Capex,人工等)大了很多,但无法阻止中国模型低成本跟进,Scaling Law不能简单理解成为投入大力出奇迹</p><p>应用:AI应用的叙事从无所不能威胁人类的AGI收敛到现在的三种主要商业模式,是从梦想回归现实的过程</p><p>订阅制是OpenAI现在的核心商业模式:超过5亿DAU后,全球知识工作者低垂的果实已摘得差不多了,面临Gemini等的激烈竞争,针对普通用户再提价会比较难</p><p>被寄予厚望的广告 + 电商:首先其中大部分是分Meta、Google、字节的存量蛋糕,对于Chatbot这样新形态的应用,探索广告和电商变现的速度不会很快广告 + 电商:首先大量是存量分蛋糕,然后对于新形态的应用,速度没那么快</p><p>AI Coding/图片视频生成等“基于用量付费”的生产力产品:Token用量会持续增长,但Token价格也在持续下降,用户只会为SOTA的智能按用量付费;原来值钱的任务会很快变得不值钱,所以AI替代了很多程序员,并不意味着AI能长期赚到这些程序员的工资</p><p>AI+行业的企业服务:这部分首先还在早期市场,规模有限,尝鲜的企业多,长期留存未必好,一个例子是微软Copilot的发展持续低于预期,大公司有数据安全、权限、隐私、工作流再造等一系列阻碍,使用新技术的速度比小公司和个人要慢不少</p><p>结论: 需要实现Satya说的GDP加速增长,把蛋糕做大才是真正的AGI,比如说AI创造新的药物,发现新的知识,真正解放人类注意力等</p><p>投入:现在美国基础设施建设慢,算力贬值快,人员工资高,巨额投入需要尽快看到回报</p><p>2025年底二级市场的预期也和2024年底完全不一样:去年底是市场预期不高,但我们看到ChatGPT增速很快,Coding、Agentic模型提升的确定性带来应用机会;现在是投入很大预期很高,但短期模型端看不到革命性的新能力,新的范式变化还在萌芽期</p><p>对创业者的启示?</p><p>负毛利烧钱一味追求增长的逻辑正在过去,需要有增长和毛利率并重的高质量增长。尤其是在美国非常宽松的融资环境可能会放缓,中美创投市场的价格鸿沟将会缩短</p><p><strong>01:16:13 Research:</strong></p><p><strong>new paradigm:</strong>AI历史上都是阶跃提高,需要有新的paradigm从新带来AI 能力的大增长,Ilya:scaling和research是交替的,现在又到了research的时候</p><p>目前看Online Learning、世界模型等都是重要的研究方向</p><p><strong>neo labs:</strong>Thinking machines, SSI, Reflection, 到近期的Humans&,Periodic,Isara等)</p><p>因为做工程和产品和做研究是很不一样的,需要有宽松的环境,自由探索的文化,不设时间和KPI限制,大家希望neo labs能够探索和现在头部模型公司有差异化的新路径</p><p><strong>new benchmark:</strong>现在的benchmark已经不能很好体现AI能力的区别,也不利于作为模型训练的目标,如何衡量一个在大多数领域超过人类表现的模型?姚顺雨指出的下半场已到,需要新的benchmark</p><p>对创业者的启示:要关注前沿研究的进展,研究的突破可能会解锁新的应用机会</p><p><strong>01:21:00 Remember(Memory):</strong></p><p>Memory是AI应用关键的差异化,现在的Memory能力已经对ChatGPT留存产生了很大的提高</p><p>现在的Memory基本上还是基于retrieval的,没有做到真正的理解,这部分也是研究的兵家必争之地,如果做好会带来进一步的提高</p><p>Proactive Agent:有memory和context才能解锁Proactive Agent的机会,而Proactive Agent非常重要,因为人主动去用AI意图有限,AI主动为人服务才能有10x的场景机会</p><p><strong>01:24:06 多模态:</strong></p><p>Visual Reasoning可能会有大的突破,人本质上是Pixel Machine,通过视觉输入理解世界,可以关注Zerobench这个Visual Reasoning Benchmark的表现提升,现在头部模型基本上还是不到10分</p><p>Nano Nanana意味着图片生成进入到Sonnet 3.5这样的可用时代,那么Cursor of Image-gen会是什么?</p><p>GPT-3.5解锁了ChatGPT,Sonnet 3.5解锁了Cursor,Sonnet 3.7解锁了Manus,Nano Nanana/Veo会解锁什么应用的机会?在ChatGPT里面用Imagegen/Videogen显然不是很舒服的体验</p><p>语音是很重要的机会,更好更自然的交互,理解用户的Context,Plaud,Granola,Wispr flow/Typeless,Suno?</p><blockquote><p><strong>01:30:29 AI Bubble</strong></p></blockquote><p>从二级市场来讲,明年有可能出现大的回调,时间点可能是下半年</p><p>《Boom: Bubbles and the End of Stagnation》书中提到了两种泡沫:好的泡沫和坏的泡沫</p><p>如果预期是回调,明年的投资策略变化是什么?</p><p>二级会如何传导到一级?</p><p>怎么看朱啸虎说:“至少三年内看不到泡沫”、“他们的论点是无稽之谈”?</p><p>“我个人现在是全部空仓的”</p><p>中美的估值差距预期会缩短</p><blockquote><p><strong>01:47:38 创业端变化和建议</strong></p></blockquote><p>基于Year of R的理论,对创业者的建议?</p><p>AI时代怎么判断创始人?和互联网时代最大不同是什么?</p><p>创业像F1赛车</p><p>这两年miss什么项目没?</p><p>有哪些方向是因为AI出现带来增量的?</p><p>Chatbot之外不错的交互是什么?</p><p>今年个人聊了150个项目,只投了2个</p><blockquote><p><strong>02:18:31 也谈谈人生</strong></p></blockquote><p>对个人的思考:今年的读书、思考与人生</p><p>对VC的思考:年轻的投资人要差异化</p><p>对普通人的思考:学会在一个智能充沛的世界里生活</p><blockquote><p><strong>02:29:50 最后的快问快答</strong></p></blockquote><blockquote><p><strong>最后一个问题:你提出Year of R,你也清空了二级市场股票,那么你会做空吗?</strong></p></blockquote><blockquote><p><strong>02:36:10 在这集节目的结束,我又放了一段和雨森在录节目之前的一场闲谈,比较随意。我们点评了一下那些时常会被议论起的AI公司。如果你觉得有意思,也可以继续听下去</strong></p></blockquote><p><strong>02:36:30 </strong>OpenAI</p><p>02:46:38 Google(我并不觉得Genimi能阻止ChatGPT的增长,不觉得Google已经脱离危险)</p><p>03:06:36 Anthropic</p><p>03:11:05 Manus</p><p>03:19:47 Thinking Machines Lab、Safe Superintelligence Inc.</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>年终对话【站在2025年之外】:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/693834013fec3166cf262bd0" rel="noopener noreferrer nofollow" target="_blank">《122. 朱啸虎现实主义故事的第三次连载:人工智能的盛筵与泡泡》</a></p><p>【更多信息】</p><p>免责声明:本内容不作为投资建议。</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
123. 3-hour interview with ONE2X founder Wang Guan: Generative systems, no intermediaries earning a profit, and platform power distributionFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-12-12 08:03
Today's guest is an entrepreneur, Wang Guan, co-founder and CEO of ONE2X. Their current product is the AI video generator Medeo. Wang Guan is a product manager-turned-entrepreneur. I've known him for a long time, since he left Kimi to start his own business. Prior to that, he was Kimi's Model Product Lead. (Oh, by the way, the previous Kimi product person who left and came to our show was Ming Chaoping.) Beyond how application-oriented companies in the AI era build products and organizations, as a content creator myself, I was also very curious to discuss with him many topics concerning **new era content platforms, generative systems, and the power distribution among AI, creators, and platforms.** In 2025, we look forward to making progress together with AI! 02:00 Self-introduction, product manager experience, and the start of entrepreneurship 28:39 First time hearing "compression is intelligence," greatly impressed 32:25 The first person to leave Moonshot AI to start a business 37:11 Data is the first principle of intelligence; data determines the boundaries of intelligence 42:23 Three stages of data: public domain data > domain-specific data > product-native data 01:05:36 Why choose the video generation direction? 01:26:15 How will AI reshape the existing internet landscape? 01:30:50 Broad AGI vs. Narrow AGI 01:41:59 The boundaries between application companies and model companies will blur 02:01:44 Companies in the AI era will ultimately all be generative system companies 02:25:49 As the center of power permeates towards consumers, how will platforms and creators evolve? 02:38:11 What is the essential difference between generative systems and recommendation systems? "No middlemen earning a price difference." 02:50:34 How should AI products be made? The North Star metric is the degree of intelligence 03:05:45 A remote-first organization 03:20:18 Future platforms will evolve from distribution platforms to production and sales platforms Our past interviews with AI application companies: "95. A 3-Hour Interview with Manus Founder Xiao Hong: The World Is Not a Linear Extrapolation; Be an Important Variable in the Game" "103. Lovart Founder Chen Mian Reviews Two Years of Application Entrepreneurship: This Moment Is So Awesome!! Hahahahaha" "101. A 3-Hour Interview with YouWare Founder Ming Chaoping: Today's Agents Are Like Gorillas Who Just Picked Up a Firewood Stick" Other episodes mentioned in this episode: "59. Talking with Yang Zhilin About a Year of Large Model Entrepreneurship: The Incremental Ideal of Humanity, Probable Non-Consensus, and Sora" "113. A Conversation with Yang Zhilin After 1 Year: K2, Agentic LLM, Brain in a Vat, and "Standing at the Beginning of Infinity"" "115. A 3-Hour Interview with OpenAI's Yao Shunyu: 6 Years of Agent Research, Humans and Systems, The Boundaries of Engulfment, A World Both Unipolar and Diverse"
Original title: 123. 对ONE2X创始人王冠3小时访谈:生成系统、没有中间商赚差价、内容平台的权力分配
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是一位创业者,ONE2X联合创始人兼CEO王冠,他们现在的产品是AI视频生成器Medeo。</p><p>王冠是一名产品经理型的创业者,我和他认识了很长时间,那时他刚从Kimi离职出来创业,此前他是Kimi模型产品负责人。(哦对了,上一个Kimi产品离职来我们节目的是明超平。)</p><p>除了AI时代应用型公司怎么做产品、搭组织之外,由于我也是一名内容创作者,所以我也很好奇地与他讨论了许多关于<strong>新时代的内容平台,生成系统,AI、创作者与平台权力分配的话题。</strong></p><p>2025年,期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FhbwWSJZ6BsVZBmGZDATweipe106.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>02:00 自我介绍、产品经理的经历和创业的开端</p><p>28:39 第一次听说“压缩即智能”,大为震撼</p><p>32:25 从月之暗面第一个离职创业的人</p><p>37:11 数据是智能的第一性原理,数据决定的智能的边界</p><p>42:23 数据三个阶段:公域数据>领域数据>产品内生数据</p><p>01:05:36 为什么选择视频生成方向?</p><p>01:26:15 AI如何重塑现有互联网格局?</p><p>01:30:50 广义AGI vs 狭义AGI</p><p>01:41:59 应用公司与模型公司的边界会变得模糊</p><p>02:01:44 AI时代的公司最终都是生成系统公司</p><p>02:25:49 权力重心向消费者端渗透,平台和创作者会如何演变?</p><p>02:38:11 生成系统和推荐系统的本质区别是?“没有中间商赚差价”</p><p>02:50:34 应该怎么做AI产品?北极星指标是智慧程度</p><p>03:05:45 一个远程办公的组织</p><p>03:20:18 未来的平台会从分销平台到产销平台</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>我们对AI应用型公司的过往访谈:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67c3d80fb0167b8db9e3ec0f">《95. 对Manus创始人肖弘的3小时访谈:世界不是线性外推,做博弈中的重要变量》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68455e0a6dbe9284e75c6fbf">《103. Lovart创始人陈冕复盘应用创业这两年:这一刻就是好爽啊!!哈哈哈哈哈》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68372c9631215eb5063bcdb1">《101. 对YouWare创始人明超平3小时访谈:今天Agent像大猩猩刚拿起一根烧火棍》</a></p><p>本集中提到的其他节目:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/65e16b5b6144a933b1d968b5">《59. 和杨植麟聊大模型创业这一年:人类理想的增量、有概率的非共识和Sora》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68ae86d18ce45d46d49c4d50">《113. 和杨植麟时隔1年的对话:K2、Agentic LLM、缸中之脑和“站在无限的开端”》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68c29ca12c82c9dccadba127">《115. 对OpenAI姚顺雨3小时访谈:6年Agent研究、人与系统、吞噬的边界、既单极又多元的世界》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
122. The third installment of Zhu Xiaohu's Realism Story: The Feast and Bubble of Artificial IntelligenceFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-12-09 14:42
In March 2024 and February 2025, I updated Zhu Xiaohu's Chinese Realism AIGC story twice. Now, almost a year has passed again.Standing at the tail end of 2025, **is there a bubble in the AI industry? Will the bubble burst? Are investors optimistic about expectations for 2026? Is now still a good time to invest in Nvidia and OpenAI? — This is the third installment of Zhu Xiaohu's Realism Story.**In the last month of 2025, I still want to say to everyone: We look forward to progressing together with AI!01:40 OpenAI becoming more realistic: You can tell by Sam Altman; he barely mentions AGI this year, right?05:46 The battle for AI's super entry point: It's bound to be a battle for super entry points, and a battle for daily active users, a battle for usage time.08:34 No bubble in three years: I think all the arguments they make are nonsense.13:24 Do you hold Nvidia, OpenAI?16:16 Everyone underestimates DeepSeek: Without DeepSeek, human AI might be controlled by a few private companies.19:06 Deviate 15 degrees from the consensus, and the cost-effectiveness immediately stands out, right?24:41 Three streets away from big tech, three streets away from big tech, right?30:21 As everyone knows, whether I invest or not, it takes ten minutes.38:30 Today's VC consensus is too concentrated: Every project is a Club Deal, each equity stake is very small, so how do you make money? — GPs can't make big money, and LPs are also very unhappy.40:50 Playing a mobile game like "Honor of Kings," half an hour or twenty minutes is enough for a session."62. The Zhu Xiaohu you wanted is here""90. Zhu Xiaohu is back: The 1-Year Series of Chinese Realism AIGC Stories"【More Information】This episode is jointly presented by Language is the World Studio and Weibo Finance.Disclaimer: This content does not constitute investment advice.
Original title: 122. 朱啸虎现实主义故事的第三次连载:人工智能的盛筵与泡泡
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>2024年3月、2025年2月,我曾两次更新朱啸虎的中国现实主义AIGC故事,现在又过去了快1年时间。</p><p>站在2025年的尾巴上,<strong>AI产业有泡沫吗?泡沫会破吗?投资人对2026年的预期乐观吗?现在还是投资英伟达、OpenAI的好时候吗?——这里是朱啸虎现实主义故事的第三次连载。</strong></p><p>2025年的最后一个月,还想和大家说那句:期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FgsAq-BQe8usSQlhrOhEZQH0LOv2.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>01:40 越来越现实的OpenAI:你看Sam Altman就知道了,今年几乎不太提AGI了,对吧?</p><p>05:46 AI的超级入口之争:必然的就是超级入口之争,而且是日活之争、时长之争</p><p>08:34 三年内看不到泡沫:他们讲的这些论点,我觉得都是无稽之谈</p><p>13:24 你持有英伟达、OpenAI吗?</p><p>16:16 大家低估了DeepSeek:如果没有DeepSeek,可能人类的AI是被几个私有公司控制的</p><p>19:06 和共识错开15度,那性价比一下子拉出来了,是吧?</p><p>24:41 离开大厂三条马路,离开大厂三条马路,对吧?</p><p>30:21 大家知道,我投不投都是十分钟</p><p>38:30 今天的VC共识太集中了:每个项目上都是Club Deal(俱乐部交易),每个股份比例都很小,那怎么赚钱?——GP赚不了大钱,LP也很不开心</p><p>40:50 手机游戏打个《王者荣耀》,单独抽半小时、二十分钟就够了</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p><a href="https://www.xiaoyuzhoufm.com/episodes/66090a2c1519139e4fa97f99">《62. 你们要的朱啸虎,来了》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67a5a740d74435e4a39e38df">《90. 朱啸虎又来了:中国现实主义AIGC故事的1周年连载》</a></p><p>【更多信息】</p><p>本集由语言即世界工作室与微博财经联合呈现。</p><p>免责声明:本内容不作为投资建议。</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
121. An interview with Tan Jie of DeepMind: Robotics, cross-ontology, World Models, Gemini Robotics 1.5, and GoogleFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-11-28 09:17
The guest today is **Tan Jie, Senior Research Scientist and Tech Lead at Google DeepMind Robotics Team**. His research focuses on applying foundation models and deep reinforcement learning methods to the field of robotics. There have always been two narratives in the field of robotics between China and the US: the market generally believes that China develops faster in hardware, while the US leads in robotic brain design. **In this episode, Tan Jie will offer us a glimpse into the cutting-edge narrative of robotics from a Silicon Valley perspective, especially that of Google DeepMind.** Not long ago, they just released their new work "Gemini Robotics 1.5 brings AI agents into the physical world," and we also discussed their latest findings. Due to the guest's work environment, there will be a certain degree of mixed Chinese and English, and we ask for everyone's understanding and support. > **02:00 Robotics is doing graphics in the real world; graphics is doing robotics in simulation.** Guest's brief biography: Liked playing games as a child, pursued a Ph.D. in computer graphics. The transition from graphics to robotics. My first paper at Google, "Sim-to-Real: Learning Agile Locomotion For Quadruped Robots," pioneered the application of reinforcement learning and sim-to-real in legged robots. Paradigm Shift: The first in the past decade was reinforcement learning, the second was large language models. The impact of large language models on robotics (large language models are like the cerebrum, reinforcement learning is like the cerebellum). > **13:06 Is the robotics foundation model truly a very independent discipline? So far, not yet.** What stage has robotics development reached today? It's not an exaggeration for a decade to pass from a demo to actual implementation. From my perspective, I have to admit that the development of robotics intelligence in recent years has mainly relied on multimodal large models. But what do multimodal models lack? They lack the output of robot actions. When you truly have a generalist model, specialized models simply cannot compete with it. > **23:44 The biggest problem in Robotics is data; it's in a very complex unstructured environment where anything can happen.** The biggest problem is still data. But robotics operates in a very complex unstructured environment where anything can happen. It requires an extremely large amount of very diverse data, but such data does not currently exist. There are many startups now called "data factory." What does the so-called "data pyramid" include? > **27:52 Gemini Robotics 1.5: We have a method called motion transfer, which is our unique secret.** What are the most important discoveries of Gemini Robotics 1.5? First, we incorporated "thinking" into the VLA model. The second very important breakthrough is cross-embodiment transfer. In the Gemini Robotics 1.5 work, we made a distinction between fast and slow models. It should be a transitional approach, as it is currently constrained by computational power and model size. When you want a unify model, it must be very large. Motion Transfer? It's very secret. > **47:32 Generating a huge amount of simulated data is an important means to compensate for its shortcomings.** One point we attach great importance to is data, data, data. Teleoperation is very difficult data to acquire. We will put more effort into using, for example, simulation data, human video, data from YouTube, and even model-generated data, such as some data generated by VEO. Real data has no sim-to-real gap, but generalizability is determined by data coverage, not by whether it's real or virtual data itself. In the near future, traditional physical simulation will gradually be replaced by generative model-based simulation. I believe in scalable data. > **01:03:48 A world model is Vision-Language-Vision, vision and language in, generating the next frame of images.** The definition of a world model is: if you provide the previous frame and the robot's action, you can predict the next frame. From another perspective, VEO is a video generation model, but Genie is more like a world model. When you can have an input at each frame to change your next frame, that feeling is a world model; but if it's an already generated, static video of a few seconds, then it's not. A world model is essentially Vision-Language-Vision, with vision and language as input, it can generate the next frame of images. > **01:08:29 If you have a dexterous hand, haptics become very important. The reason I previously thought haptics were not important was limited by the hardware at the time.** If you have a dexterous hand, haptics become very important. The reason I previously thought haptics were not important was that it was actually limited by the hardware at the time. We are still in the gripper era. For all tasks that can be accomplished by grippers, I still believe vision can solve 95% of the problems. In the future, humanoid robots will not be the only form, but they will definitely be a mainstream form. If your goal is to solve AGI in the physical world, then I will be very focused on what the final form looks like; other things might be distractions. > **01:17:35 A person with a sense of mission will not tolerate saying "I'm on a wrong ship."** Have there been any changes in Google AI or robotics research culture in recent years? Whether it's promotion, performance review, incentive, or various structures, Google wants to create an environment where more people can work together to solve bigger problems. Like Gemini Robotics, it's more top-down. I found that China might not be as competitive as me; I might work 70 to 80 hours a week. Seriously, this era really can't wait, otherwise others will have already done it. A lot of AI is mathematics, and Chinese people are generally better at mathematics. 《106. Talking with Wang He about the Academic Edge History of Embodied Intelligence and the Man-made Chaos after Capital Bombardment》 《109. Robots Encountering a Data Famine? Talking with Xie Chen: Simulation and Synthetic Data, Meta's Sky-High Acquisition, and Alexandr Wang》 [More Information] The text version of this episode has been published. Please search for our studio's official public account: 语言即世界language is world
Original title: 121. 对DeepMind谭捷的访谈:机器人、跨本体、世界模型、Gemini Robotics 1.5和Google
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是<strong>Google DeepMind机器人团队的高级研究科学家兼技术负责人谭捷</strong>,他的研究方向是将基础模型和深度强化学习方法应用于机器人领域。</p><p>中美在机器人领域一直存在两种叙事:市场普遍认为,中国在硬件上发展更快,美国在机器人大脑设计上更领先。</p><p><strong>本期节目中,谭捷将带我们一窥硅谷视角,尤其是Google DeepMind视角下的机器人前沿叙事。</strong></p><p>前不久,他们刚发布了新工作 “Gemini Robotics 1.5 brings AI agents into the physical world”(Gemini Robotics 1.5将AI Agents带入物理世界),我们也聊了聊他们的最新发现。</p><p>由于嘉宾工作环境的原因,会出现一定程度的中英夹杂,还大家多多包容和支持。</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fou2bKSBSkt--i4_WxqqBjg8IpW0.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:00 机器人是在真实世界里做图形学,图形学是在simulation里做机器人</strong></blockquote><p>嘉宾小传:小时候喜欢打游戏,读博士读的计算机图形学</p><p>从图形学转型机器人的变轨</p><p>我在Google的第一篇论文《Sim-to-Real: Learning Agile Locomotion For Quadruped Robots》(从仿真到现实:学习四足机器人敏捷运动),开创了强化学习和seem to real在足式机器人上的应用</p><p>Paradigm Shift,过去十年第一个是强化学习,第二个是大语言模型</p><p>大语言模型对机器人的影响(大语言模型类似大脑,强化学习类似小脑)</p><blockquote><strong>13:06 机器人基座大模型到底是不是一个非常独立的学科?So far, not yet</strong></blockquote><p>今天的机器人发展到什么阶段了?</p><p>从demo到真正落地,隔十年并不是一个非常夸张的事</p><p>从我的角度来说,我不得不承认,最近几年的机器人智能发展主要还是依赖于多模态大模型</p><p>但多模态模型缺什么呢?缺少robot action的输出</p><p>当你真正有一个generalist model(通用模型)的时候,specialized model(专有模型)就完全不能与之竞争</p><blockquote><strong>23:44 Robotics最大问题是数据,它在一个非常复杂的unstructured environment里,可以发生任何事情</strong></blockquote><p>最大的问题还是数据问题</p><p>但是robotics是在一个非常复杂的unstructured environment(非结构化环境)里,可以发生任何事情</p><p>它需要极大量的、非常diverse(多元)的数据,但这些数据现在是不存在的</p><p>现在有很多startup叫data factory(数据工厂)</p><p>所谓“数据金字塔”包括哪些?</p><blockquote><strong>27:52 Gemini Robotics 1.5:我们有一个方法叫motion transfer,这是独门秘诀</strong></blockquote><p>Gemini Robotics 1.5最重要的发现是什么?</p><p>第一个是我们把“thinking”加入了VLA模型</p><p>第二个非常重要的突破是cross-embodiment transfer(跨具身迁移)</p><p>Gemini Robotics 1.5的工作中,我们做了一个快慢模型的划分</p><p>它应该是个过渡的方式,因为现在受制于算力的限制、模型大小的限制</p><p>当你要一个unify model(统一模型)的时候,它必须非常大</p><p>Motion Transfer?It’s very secret</p><blockquote><strong>47:32 生成极大量仿真数据,是弥补它缺点的一个重要手段</strong></blockquote><p>我们比较重视的一点还是数据、数据、数据</p><p>遥操作是非常难以获取的数据</p><p>我们会花更多的精力,比如利用simulation数据,利用human video(人类视频),利用YouTube上的一些数据,甚至利用模型生成的数据,比如VEO生成的一些数据</p><p>真实数据没有sim-to-real gap(仿真到现实差距),但是泛化性是由数据的coverage(覆盖)导致的,并不是因为它本身是真实数据还是虚拟数据</p><p>在不远的将来,传统物理模拟仿真会慢慢地被生成式模型的仿真所取代</p><p>我信仰的是scalable data</p><blockquote><strong>01:03:48 世界模型就是Vision-Language-Vision,vision和language in,生成下一帧的图像</strong></blockquote><p>世界模型的定义是:如果给上前一帧,再给上机器人的动作,你可以预测下一帧</p><p>从另外一个角度,VEO它是一个视频生成模型,但是Genie它更像一个世界模型</p><p>当你在每一帧的时候,可以有一个输入来改变你的下一帧,那个感觉就是世界模型;但是如果它是一个已经生成好的、几秒钟的静态视频,那就不是</p><p>世界模型其实就是Vision-Language-Vision,vision和language in,它可以生成下一帧的图像</p><blockquote><strong>01:08:29 如果你有灵巧手,触觉就非常重要,之所以我前面觉得触觉不重要,是受限于当时的硬件</strong></blockquote><p>如果你有灵巧手,触觉就非常重要</p><p>之所以我前面觉得触觉不重要,是因为它其实受限于当时的硬件</p><p>现在还在夹爪时代</p><p>在所有夹爪能完成的任务里,我还是觉得视觉可能可以解决95%的问题</p><p>在未来,人形机器人不会成为唯一的形态,但一定是个主流的形态</p><p>如果你的目标是solve AGI in the physical world(在物理世界实现AGI),那么我会非常聚焦于最终的形态是什么样子,其他的东西可能都是distraction(干扰)</p><blockquote><strong>01:17:35 一个有使命感的人,他不会容忍说“I’m on a wrong ship”</strong></blockquote><p>这几年Google AI或者robotics的研究文化上有没有发生过变化?</p><p>不管是从promotion、performance review、incentive,还是各种各样的structure上,Google想创造一个环境,使得更多的人可以一起解决更大的事情</p><p>像Gemini Robotics,它更多是自上而下</p><p>我发觉好像国内不一定比我卷,我一周可能工作70到80个小时</p><p>真的,这个时代真的是等不起,不然别人都做出来了</p><p>AI有很多是数学,华人数学比较好</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p><a href="https://www.xiaoyuzhoufm.com/episodes/6857f2174abe6e29cb65d76e">《106. 和王鹤聊,具身智能的学术边缘史和资本轰炸后的人为乱象》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68767e4c93fd2d72b8607c80">《109. 机器人遭遇数据荒?与谢晨聊:仿真与合成数据、Meta天价收购和Alexandr Wang》</a></p><p>【更多信息】</p><p>本集的文字版本已发布,请搜索我们工作室的官方公众号:</p><p>语言即世界language is world</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
120. Xiaopeng's newly appointed Liu Xianming's first interview: Language is poison, remove L, simplicity is beauty, change of leadership, Xiaopeng's AI transformationFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-11-18 09:06
Today's guest is Liu Xianming, head of Xpeng Motors' autonomous driving center. On October 9, 2025, Xpeng Motors suddenly announced that Li Liyun, the former head of the autonomous driving center, would step down and be replaced by Liu Xianming, the head of the world base model. This means that Liu Xianming is the fourth person in charge of Xpeng's autonomous driving after Gu Junli, Wu Xinzhou (now head of Nvidia's autonomous driving China team), and Li Liyun. The outside world is very curious about him. **This is Liu Xianming's first exclusive interview since taking office.** Our interview time was October 30, 2025. In this episode, **we talked about his key technology decisions to dismantle the large model Language after taking office, and the AI strategic transformation of a car company.** > **02:16 Character Sketch** Formerly engaged in machine learning and computer vision research at Meta and Cruise It just so happened that Cruise was second at the time, and the story of joining the second place and counterattacking is always exciting. The beginning and end of joining Xpeng Motors: meeting with He Xiaopeng in the US office for 1 hour in January 2024 The technical stage of autonomous driving that he has personally experienced > **19:00 Large Model Dismantles Language** Our approach is simple and direct, just dismantle the VLA's Language The model is a machine, the fuel is data, and once Language is mixed in, the efficiency becomes extremely low. We simply dismantled all the Language: input V-L joint corpus, directly output Action The process of "dismantling L", "simple is beautiful" Key data issues > **33:53 Xpeng Motors' Transformation to a Physical AI Strategy** Why does the autonomous driving strategy of a car company need to transform to an AI strategy? The beginning of the transformation may be the 10th anniversary of Xpeng Motors last year. Autonomous driving companies are concerned about KPIs and takeover rates, while AI companies focus on underlying technical indicators, even risky long-term indicators. Liu Xianming's short-term and long-term KPIs What does AI mean to Xpeng Motors? "It's a multiplication factor" In addition to dismantling Language this year, lidar, regulation and control rules, and end-to-end have also been dismantled before. Why has the development of artificial intelligence been going through a process of dismantling? World model Planning for L4 next year > **54:30 Behind the Change of Leadership** Things I've done in the past year that were "stubborn" It looks like I have a good personality, but I have also slammed the table and lost my temper. I encountered great resistance in the process of "dismantling L" because it goes against common sense in papers. The counter-consensus of DeepSeek-OCR Current focus on cutting-edge AI directions Responding to the views of Yu Kai, founder of Horizon ("Autonomous driving should be handed over to suppliers") Why is there no generational difference in domestic autonomous driving? AI is an important competition point for car companies in the next stage, and those who don't do well will be eliminated. He Xiaopeng's attention time, methods, and recent 3 topics on AI The gene problem of manufacturing companies and AI companies Next, the challenges for me Xpeng Zhidong's number one position, everyone's historical mission 《70. Chatting with He Xiaopeng, FSD, "Swimming in a Sea of Blood", Heroes and Dogs in Troubled Times》
Original title: 120. 小鹏新上任的刘先明首次访谈:Language是毒药、拆掉L、简单即美、换帅、小鹏的AI转型
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是小鹏汽车自动驾驶中心负责人刘先明。</p><p>就在2025年10月9日,小鹏汽车突然宣布,原自动驾驶中心负责人李力耘将卸任,由世界基座模型负责人刘先明接任。</p><p>这意味着,刘先明成为小鹏在自动驾驶上,既谷俊丽、吴新宙(现英伟达自动驾驶中国团队负责人)、李力耘之后的第四任负责人。外界对他有诸多的好奇。</p><p><strong>这是刘先明上任后首次接受专访。</strong>我们访谈的时间是2025年10月30日。这集节目,<strong>我们聊了聊他上任后拆掉大模型Language等关键技术决策,以及一家车企的AI战略转型。</strong></p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FnviL6xH_VryZ3pil5QmY8VreFNF.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:16 人物小记</strong></blockquote><p>曾在Meta、Cruise,从事机器学习与计算机视觉研究</p><p>恰好Cruise当时是第二名,加入第二名再逆袭的故事永远是令人兴奋的</p><p>加入小鹏汽车始末:2024年1月在美国办公室与何小鹏见面1小时</p><p>所亲历过的自动驾驶的技术stage</p><blockquote><strong>19:00 大模型拆Language</strong></blockquote><p>我们的做法简单直接,把VLA的Language拆掉就完了</p><p>模型是机器,燃料是数据,一旦掺入Language会让效率变得极低</p><p>我们干脆把Language全都拆掉好了:输入V-L联合语料,直接输出Action</p><p>“拆L”的过程、“简单就是美”</p><p>关键的数据问题</p><blockquote><strong>33:53 小鹏汽车向物理AI战略的转型</strong></blockquote><p>为什么一家汽车公司的自动驾驶战略需要向AI战略转型?</p><p>转型的开端可能是去年小鹏汽车10周年</p><p>自动驾驶企业关心的是KPI、接管率,AI企业关注底层的技术指标,甚至risky的长期指标</p><p>刘先明的短期和长期KPI</p><p>AI对于小鹏汽车意味着什么?“是乘法因子”</p><p>除了今年拆Language,之前还拆了激光雷达、规控规则、端到端</p><p>人工智能发展为什么一直在经历着拆拆拆的过程?</p><p>世界模型</p><p>明年对L4的规划</p><blockquote><strong>54:30 换帅的背后</strong></blockquote><p>过去1年做过“头铁”的事情</p><p>看起来我性格很好,我也拍过桌子、发过火</p><p>“拆L”过程中遇到很大阻力,因为这很反paper里的常识</p><p>DeepSeek-OCR的反共识</p><p>现在关注的AI前沿方向</p><p>回应地平线创始人余凯的观点(“自动驾驶应该交给供应商”)</p><p>为什么国内自动驾驶还没有代际差?</p><p>AI是车企下一阶段的重要赛点,做不好会被淘汰</p><p>何小鹏对于AI的关注时间、方式和最近的3次话题</p><p>制造企业和AI企业的基因问题</p><p>接下来,对于我的挑战</p><p>小鹏智驾一号位,每个人的历史使命</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p><a href="https://www.xiaoyuzhoufm.com/episodes/6695032837236c546e4c2e0f">《70. 和何小鹏聊,FSD、“在血海游泳”、乱世中的英雄与狗熊》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
Kimi Linear、Minimax M2、杨松琳考古算法变种史,以及未来架构改进方案预演。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-11-03 23:00
This episode will discuss a crucial topic: **AI algorithm and architecture innovation.** Our guest is MIT PhD student Yang Songlin, specializing in linear attention mechanisms. We'll delve into the newly released models Kimi Linear, Minimax M2, and Qwen3-Next. Songlin contributed to Kimi Linear and Qwen3-Next and **is a co-author of the Kimi Linear paper.** Why is algorithm innovation particularly important in 2025? Because data, computing power, and algorithms drive AI. With data limitations, model companies are "sculpting model architectures" hoping Scaling Law continues. China's limited computing power **has pushed Chinese AI algorithm innovation to the forefront.** You'll hear that **DeepSeek's MoE (Mixture of Experts) is the biggest architectural breakthrough, making MoE a global consensus; the next breakthrough may be Attention.** Chinese companies are betting on different Attention techniques: * DeepSeek is exploring Sparse Attention. * Kimi is exploring Linear Attention. * Minimax explored Linear Attention in its early M1 version but reverted to Full Attention in the released M2 version. Songlin will discuss her work on **《Kimi Linear: An Expressive, Efficient Attention Architecture》** and analyze these companies' Attention choices; **She'll also lead us through AI algorithm variations and predict future algorithm and architecture improvements.** > *This episode is technical and may be challenging. Listen according to your needs. The guest's work environment uses both Chinese and English.* **04:00** Personal background, research focus, and exploration of linear attention. **06:27** Songlin created an open-source library: flash-linear-attention (FLA). **07:04** Understanding Linear Attention's "Linear" in simple terms. **11:19** Discussing recent work, the newly released 《Kimi Linear: An Expressive, Efficient Attention Architecture》. (Invited by Zhang, Yu, another FLA author) **12:20** Why did Kimi need to redesign the attention mechanism at the beginning of the year? The background and goals of the design. Linear Attention significantly reduces computation and memory costs during inference. Full Attention is very expensive for long text decoding. **14:39** **Key explanation of the 《Kimi Linear》 paper: KDA module** (Kimi Delta Attention, incremental attention mechanism). **18:56** Kimi has a Scaling Ladder; good performance at one scale leads to scaling at the next. **20:20 Kimi Linear Attention vs DeepSeek Sparse Attention:** Kimi uses linear attention, DeepSeek uses sparse attention, both aiming to solve long text decoding efficiency. **23:01 Minimax's architectural changes from M1 to M2, reverting from Linear Attention to Full Attention:** Why? **27:00** Cannot fully discuss Silicon Valley's attention mechanisms, but can briefly discuss OpenAI's published solutions. **28:05** The progression of Linear Attention since its invention in 2020. Linear Attention is considered when people hit the Context Wall. Recent renewed interest in long text decoding prompts a re-evaluation of this technology. **38:16** Pure Linear Attention is ineffective. Hybrid attention mechanisms still have global attention layers to ensure a baseline. **40:30 Kimi Linear inserts one full attention layer every three KDA layers, a three-to-one ratio becoming a consensus.** Minimax previously used a seven-to-one ratio, but now they're gradually returning to three-to-one - a consensus within the non-consensus of hybrid attention mechanisms. **42:32** Trade-off between expressivity and efficiency. **Minimax also mentioned that hybrid linear attention/hybrid sliding window attention has defects in "multi-hop inference."** The GAP may narrow if we develop hardware-efficient RNNs (Recurrent Neural Networks) with better expressiveness for "multi-hop inference." **46:28** chunkwise algorithm for parallelization. **47:55** How to design Attention? Two mainstream routes and some non-mainstream routes. **49:36** **Future ideal solution combining Linear Attention and Sparse Attention.** Linear Attention and Sparse Attention aren't competitive. Linear Attention's competitor might be Sliding-Window Attention. Industry exploration of combining Linear Attention and Sparse Attention seems to have not yet started. **My ideal solution: replace the global attention (Full Attention) in hybrid attention with sparse attention (Sparse Attention).** Sparse Attention can completely replace Full Attention if chosen accurately, but the problem now is it cannot be chosen accurately. **55:36** Fair comparison: Linear Attention vs Sliding-Window Attention. **57:05** Transformer → MoE → Linear/Sparse Attention algorithm evolution, driven by the goal of achieving lower loss functions with the same FLOPs (floating-point operations per second). MoE (Mixture of Experts) is a more efficient replacement for FNN (Feedforward Neural Network). **58:26 The biggest architectural breakthrough in recent years is MoE, the next breakthrough may be Attention; Transformer has two modules, FFN and Attention; FFN has been sculpted into MoE, now Attention can also be sculpted.** **01:01:28** Data, algorithms, and computing power drive AI. When data is limited, algorithm innovation becomes more important. **01:02:48** Future of architecture: 1. Can we eliminate global attention? It's the main bottleneck preventing context window scale-up. 2. Continue Learning, allowing AI to learn itself. **01:04:30** How to continue scaling up Linear Attention Transformers? **01:07:43** Chinese AI algorithm innovation is stronger than overseas because there are fewer cards (computing resources). US companies invest more in optimizers; China is gradually paying attention. **01:10:56** Other training details: NoPE vs. RoPE. **01:12:09** DeepSeek-OCR. **01:12:55** Songlin also participated in Qwen3-Next, but not Minimax M2. **01:13:39** The people who "sculpt" architectures. **01:15:16** Personal journey: "When you know exactly what you want to do, you won't encounter any setbacks." Experience sharing: PhD is going smoothly, thanks to my archaeological research in the six months before starting. **01:23:12 Speaking of archaeology, let's talk about the history of algorithm variations starting with Transformer.** **01:29:50** Delta Rule algorithm, hardware affinity, DeepSeek highly pursues hardware and algorithm matching. **01:42:23** Advice for younger people. Previous episodes with the guest: 《In-depth Explanation of DeepSeek, Kimi, MiniMax's New Attention Mechanism Papers – "Violent Aesthetics on Hardware"》 Papers mentioned: 《Kimi Linear: An Expressive, Efficient Attention Architecture》 《MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention》 《DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models》
Original title: 119. Kimi Linear、Minimax M2?和杨松琳考古算法变种史,并预演未来架构改进方案
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天这集节目,我们将讨论一个在当下非常关键的话题:<strong>人工智能的算法与架构创新。</strong></p><p>嘉宾是我们的往期嘉宾返场,她是MIT在读博士杨松琳,研究方向是线性注意力机制。</p><p>我们将从最新发布的几个模型Kimi Linear、Minimax M2、Qwen3-Next切入。松琳参与讨论Kimi Linear和Qwen3-Next的部分工作,<strong>是Kimi Linear论文的作者之一。</strong></p><p>算法创新为什么在2025年变得尤为重要?</p><p>它的背后原因是,数据、算力和算法是驱动人工智能的三驾火车,在数据撞墙的无奈前提下,各个模型公司不得不重新开始“雕模型架构”,以期Scaling Law的魔法继续。而由于中国的算力相对美国有限,<strong>这反而让中国的AI算法创新走在了世界前沿。</strong></p><p>这集节目你将听到,<strong>近几年架构最大突破是DeepSeek的MoE(混合专家模型),它让MoE成为了全球共识;而下一个突破的重要方向可能就是Attention(注意力机制)。</strong></p><p>中国公司在Attention展开了不同技术bet(押注):</p><ul><li><p>截至目前已发布模型,DeepSeek正在探索Sparse Attention(稀疏注意力机制);</p></li><li><p>Kimi正在探索Linear Attention(线性注意力机制);</p></li><li><p>Minimax在年初的M1版本中探索Linear Attention,而在刚发布的M2版本中又回退到 Full Attention(全局注意力机制)。</p></li></ul><p>节目中,松琳将讲解她参与的这篇<strong>《Kimi Linear: An Expressive, Efficient Attention Architecture》</strong>的工作,并分析以上这些公司在Attention上的不同抉择;</p><p><strong>与此同时,她也将带领大家考古人工智能算法变种史,并预演未来算法与架构的改进方案。</strong></p><blockquote><p><em>本集比较硬核,会有一些专业难度,大家可以根据自己的实际需要收听嗷:)因为嘉宾的工作环境会出现中英夹杂,希望大家多多理解和支持。</em></p></blockquote><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FmecfeaBt1PLqDUxyYlRi5y4hxW6.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p><strong>04:00</strong> 个人、研究主线与线性注意力机制的探索之路<br /><strong>06:27</strong> 松琳做过一个开源库:flash-linear-attention(简称FLA)<br /><strong>07:04</strong> 怎么通俗理解Linear Attention的Linear?<br /><strong>11:19</strong> 聊聊最近参与的新工作,前几天刚发布的《Kimi Linear: An Expressive, Efficient Attention Architecture》(Kimi Linear:一种具有强表达能力与高效率的注意力架构)<br />(FLA库的另一个作者Zhang, Yu邀请)<br /><strong>12:20</strong> 为什么Kimi在年初开始需要重新设计注意力机制?设计的背景和目标<br />在Linear Attention下,推理阶段的计算与显存成本都显著降低;而使用Full Attention时,长文本解码的代价会非常高昂<br /><strong>14:39</strong> <strong>《Kimi Linear》论文重点讲解:KDA模块</strong>(Kimi Delta Attention,增量注意力机制)<br /><strong>18:56</strong> Kimi内部有一个Scaling Ladder(规模阶梯),在一个规模下面表现好就在下一个规模下面去scale,就像通关<br /><strong>20:20 Kimi Linear Attention vs DeepSeek Sparse Attention:</strong>Kimi走线性注意力路线,DeepSeek走稀疏注意力路线,都想解决长文本decoding(长上下文生成)的效率问题<br /><strong>23:01</strong> <strong>Minimax从M1到M2的架构变化,从Linear Attention退回到Full Attention</strong>,为什么?<br /><strong>27:00</strong> 硅谷的注意力机制方案不方便说,但可以浅聊一下OpenAI有paper的方案<br /><strong>28:05</strong> Linear Attention从2020年发明出来开始后的前进线索<br />每一次大家关心Linear Attention都是因为大家撞到了Context Wall<br />最近长文本的decoding卷土重来,让人们不由自主审视这一套技术<br /><strong>38:16</strong> 纯Linear Attention是无效的,混合注意力机制还是有很多全局注意力层,这样下限有保证<br /><strong>40:30</strong> <strong>Kimi Linear每3层KDA插入1层全注意力层,三比一的比例快变成共识了</strong><br />Minimax之前用的是七比一,但现在大家逐渐回到三比一——这成为不共识的混合注意力机制中的共识了<br /><strong>42:32</strong> 权衡(Trade-off)表达能力(expressivity)与计算效率(efficiency)<br /><strong>Minimax曾经也提到,混合线性注意力/混合滑窗注意力在“多跳推理”上会有缺陷</strong><br />对于“多跳推理”,如果我们开发一些硬件高效但表达能力更好的RNN(循环神经网络),这个GAP有可能缩小<br /><strong>46:28</strong> chunkwise algorithm for parallelization(分块并行算法)<br /><strong>47:55</strong> 如何设计Attention?两条主流和一些非主流路线<br /><strong>49:36</strong> <strong>结合Linear Attention和Sparse Attention的未来理想方案</strong><br />Linear Attention和Sparse Attention没什么竞争关系,Linear Attention的竞争对手可能是Sliding-Window Attention(滑窗注意力)<br />工业界Linear Attention和Sparse Attention结合的探索似乎还没开始<br /><strong>我想象中的理想方案是:把混合注意力的全局注意力(Full Attention)换成稀疏注意力(Sparse Attention)</strong><br />只要Sparse Attention选得准,完全可以取代Full Attention,但现在的问题是它选不准<br /><strong>55:36</strong> 公平的比较:Linear Attention vs Sliding-Window Attention(滑窗注意力)<br /><strong>57:05</strong> Transformer → MoE → Linear/Sparse Attention的算法演变,背后动因是给定你相同的FLOPs(浮点运算量),利用这些FLOPs,取得更低的损失函数<br />MoE(混合专家)是更高效的FNN(前馈神经网络)的替代品<br /><strong>58:26</strong> <strong>近几年架构方面突破最大的是MoE,下一个突破可能是Attention;Transformer就两个模块,一个是FFN,一个是Attention;现在FFN已经雕成MoE,现在Attention大家也可以雕一下</strong><br /><strong>01:01:28</strong> 数据、算法、算力是驱动人工智能的三驾马车,当数据遇到数据强,算法创新变得更重要<br /><strong>01:02:48</strong> 架构的未来:1、能不能干掉全局注意力?它是阻止context window继续scale up的主要瓶颈<br />2、Continue Learning,让AI自己学习<br /><strong>01:04:30</strong> 如何把Linear Attention的Transformer继续scale up?<br /><strong>01:07:43</strong> 中国AI的算法创新相比海外肯定是更强的——因为没有那么多卡(<br />不过美国公司更多投入优化器一点,国内在逐步重视<br /><strong>01:10:56</strong> 其他训练细节:NoPE vs. RoPE<br /><strong>01:12:09</strong> DeepSeek-OCR<br /><strong>01:12:55</strong> 松琳也参与了Qwen3-Next,没有参与Minimax M2<br /><strong>01:13:39</strong> “雕”架构的人<br /><strong>01:15:16</strong> 自己的心路:“当你很清楚你要做什么的时候,你是不会遇到什么挫折的”<br />经验分享:PhD还挺顺利的,得益于我入学之前的半年考古<br /><strong>01:23:12</strong> <strong>说到考古,我们在最后聊聊从Transformer开始的算法变种历史</strong><br /><strong>01:29:50</strong> Delta Rule算法、硬件亲和、DeepSeek非常追求硬件和算法的匹配<br /><strong>01:42:23</strong> 给更年轻的年轻人的建议</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>嘉宾往期节目:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67bb3696606e5c5940533ef4" rel="noopener noreferrer nofollow" target="_blank">《逐篇讲解DeepSeek、Kimi、MiniMax注意力机制新论文——“硬件上的暴力美学”》</a></p><p>谈到的论文:</p><p><a href="https://arxiv.org/pdf/2510.26692" rel="noopener noreferrer nofollow" target="_blank">《Kimi Linear: An Expressive, Efficient Attention Architecture》</a></p><h1><a href="https://arxiv.org/abs/2506.13585?utm_source=chatgpt.com" rel="noopener noreferrer nofollow" target="_blank">《MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention》</a></h1><p><a href="https://arxiv.org/abs/2401.06066?utm_source=chatgpt.com" rel="noopener noreferrer nofollow" target="_blank">《DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
李想第二次3小时访谈:CEO大模型、MoE、梁文锋、VLA、能量、记忆、对抗人性、亲密关系、人类的智慧From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-10-30 02:30
April 2025, I recorded AI Talk Season 2 with Li Xiang, founder and CEO of Li Auto. The conversation lasted a long time, but the broadcast version was only 1 hour. Today you are seeing the full version. This episode was released later than expected. I've been too busy the past few months, and I hesitated to release it. But when I reorganized this content, I was still touched by it - **This is a "node-style thinking archive" about the technological revolution of artificial intelligence.** You can watch it in conjunction with our 3-hour conversation at the end of 2024 to feel the extension and echo of thinking between the two dialogues. **This time, I asked Li Xiang questions as a "CEO large model."** Assuming he is a MoE (Mixture of Experts) architecture model, I called his three "experts" in the first three rounds of the conversation: technology expert, strategy expert, and organization expert. And as the conversation deepened into the second half, we began to discuss people, energy, intimate relationships, memory programs, and human wisdom. **"The relationship between AI and humans" is the theme of this dialogue.** (Recorded in April 2025) > **02:35 Chapter 1: Suppose You Are a CEO Large Model** Humans do entropy reduction, AI does entropy increase Three levels of tools: "information tools", "auxiliary tools", "production tools" The important measure of a "production tool" is: you are willing to pay for it Liang Wenfeng minimally used the best human practices Following best practices is anti-human, doing whatever you want satisfies human nature I can only be the best version of myself, I have always been on the extension line of my strengths Why does Li Auto still make a base large model? At that time, we were more worried about what Chen Wei's team (the base model self-developed team) thought? This pressure is quite high > **36:18 Chapter 2: Calling MoE's Technology Expert** Li Xiang teaches you how to train VLA Reaching VLA is not a sudden change, but an evolutionary process, which has gone through three stages Let me tell you how VLA is trained and how VLA works by itself I don't do super long CoT, my CoT chain is generally two to three steps There won't be a general Agent for at least 5 years, but there will be an Agent OS You have to speak according to human nature and act against human nature If everyone doesn't want to accumulate the buns in front and just wants to eat the 10th bun, it's very similar to practicing the "Sunflower Manual" Black box, world model and pricing logic The verification cost of every 10,000 kilometers, we have reduced from 180,000 at the beginning to 4,000 yuan > **01:25:36 Chapter 3: Calling MoE's Strategic Expert** 2025 Yanqi Lake Strategic Meeting If you look at the strategy, the circle in the middle is the scale, and there are three variables outside the circle: user needs, technology products, and organizational capabilities Those with these four characteristics are the terminals of the AGI era: the ability to perceive the physical world 360 degrees, the ability to make cognitive decisions, the ability to take action, and the ability to reflect and provide feedback In the AGI era, the requirements for capabilities have become different If we look at 2030, we hope to become a globally leading artificial intelligence terminal enterprise This is the question we need to solve in the next 3-6 years Is Li Xiang's ideal too ideal? Build a 3-7 person energy body High-dimensional organizations are compatible with low-dimensional organizations > **02:09:26 Chapter 4: Wisdom is the Relationship Between Us and All Things** My memory program Starting a business is not easy, but there is no need to be miserable Eldest daughter We have achieved a "three-person support" in our family, which has greatly improved the energy of the family People are used to play, people are not used to change Don't build so many intimate relationships, too many intimate relationships prove that this person doesn't know how to manage relationships Develop wisdom as an important human trait My first 3-hour interview with Li Xiang: "3-hour interview with Li Xiang (Podcast version): Otaku, AI, family, games and ladder" This episode is also available in text and video versions: Article: Public account (language is world) Video: Bilibili (Zhang Xiaojun's business interview record)
Original title: 118. 对李想的第二次3小时访谈:CEO大模型、MoE、梁文锋、VLA、能量、记忆、对抗人性、亲密关系、人类的智慧
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>2025年4月,我与理想创始人兼CEO李想录制AI Talk第二季。那次对谈持续了很长时间,播出版仅1小时,今天你看到的是完整版。</p><p>这一集节目的发布比预期晚了些。过去几个月实在太忙了,我一度犹豫要不要继续放出。但当我重新整理这些内容时,仍然被它打动——<strong>这是一份关于人工智能技术变革的“节点式思考存档”。</strong></p><p>你可以结合2024年底我们那场3小时谈话一起观看,感受两次对话之间,思考的延展与呼应。</p><p><strong>这次,我把李想当作一个“CEO大模型”来提问。</strong></p><p>假设他是一种MoE(Mixture of Experts,专家混合)架构的模型,我在对话的前三个回合调用了他的三位“专家”:技术专家、战略专家、组织专家。而当谈话深入到后半程,我们开始讨论人、能量、亲密关系、记忆程序与人类的智慧。</p><p><strong>“AI与人的关系”,是本次对话的母题。</strong></p><p>(录制于2025年4月)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FstqHmZUbyhG6hRts2lcBWeDn0fw.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:35 第一章:假若你是一个CEO大模型</strong></blockquote><p>人类做熵减,AI做熵增</p><p>工具的三个分级:“信息工具”、“辅助工具”、“生产工具”</p><p>“生产工具”重要的衡量是:你愿意为它付钱</p><p>梁文锋极简运用了人类最佳实践</p><p>按照最佳实践是反人性的,随心所欲才满足人性</p><p>我只能做最好的自己,我一直在自己的长板延长线上</p><p>理想为什么还做基座大模型?</p><p>当时我们比较担心陈伟团队(基座模型自研团队)怎么想?这个压力挺大的</p><blockquote><strong>36:18 第二章:调用MoE之技术专家</strong></blockquote><p>李想手把手教你训VLA</p><p>达到VLA不是突变的过程,是进化的过程,经历了三个阶段</p><p>我给你讲一下VLA是怎么训的,以及VLA自己怎么去工作的</p><p>我不会做超长CoT,我的CoT链条一般两步到三步</p><p>至少5年内不会有通用Agent,但会有一个Agent OS</p><p>要顺着人性去说,逆着人性去做</p><p>如果大家不想做前面包子的积累,只想吃第10个包子,很像练《葵花宝典》</p><p>黑盒、世界模型和定价逻辑</p><p>每1万公里的验证成本,我们做到从最开始18万降到4000块钱</p><blockquote><strong>01:25:36 第三章:调用MoE之战略专家</strong></blockquote><p>2025年雁栖湖战略会</p><p>如果看战略,中间的圈是规模,圈外边有三个变量:用户需求、技术产品、组织能力</p><p>具备这四个特点的,就是AGI时代的终端:360度对物理世界感知的能力、认知决策的能力、Action的能力、反思反馈能力</p><p>到了AGI时代的终端,对于能力的要求变得不一样了</p><p>如果看到2030年,我们希望能够成为全球领先的人工智能终端企业</p><p>这是我们未来的3-6年要去解的题</p><p>李想的理想会不会太过于理想?</p><p>构建3-7人能量体</p><p>高维组织兼容低维组织</p><blockquote><strong>02:09:26 第四章:智慧是我们和万物的关系</strong></blockquote><p>我的记忆程序</p><p>创业不容易,但是没必要苦哈哈的</p><p>大女儿</p><p>我们家里实现了一个“三人支撑”,这让家里的能量大幅地提升</p><p>人是用来发挥的,人不是用来改变的</p><p>不要构建那么多亲密关系,亲密关系太多了就证明这个人不会经营关系</p><p>把智慧当成一个重要的人类特质去发展</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>对李想的第一次3小时访谈:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67769bd815a5fd520e8fa318">《对李想的3小时访谈(播客版):宅男、AI、家庭、游戏和天梯》</a></p><p>本集节目同步上线文字版和视频版:</p><p>文章:公众号(语言即世界language is world)</p><p>视频:Bilibili(张小珺商业访谈录)</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
开源论文探索之旅:模型范式、基础设施与数据、语言、多模态的完整演变史From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-10-28 02:33
今天的嘉宾是谢青池,他是美团光年之外的产品负责人。 一个月前,青池找到我,说他用了一年多的时间一篇一篇地啃完了200多篇AI论文,从开始全然不得要领,到后来逐渐地入门——**而他希望将他的论文探索之旅开源给大家。** 就这样,我们有了今天这集特别的节目。 他从200多篇论文中精选了36篇经典,4小时讲解,带你穿越AI变迁史。 他说,**读论文是“给你打开一扇门”,让你能直接“与这个世界最聪明的头脑对话”。** 2025年,期待我们和AI共同进步! 01:30 探索的缘起 07:25 怎么读论文?(用AI学AI) 10:20 辅助小工具和路书 **论文讲解的主干:** > **19:35 Part 1:模型的范式变迁** 故事要从1999年的第一颗GPU开始讲起 Brook: 用GPU进行计算 (2004.08) AlexNet: 深度学习的开端(2012.10) 对序列建模:seq2seq和Attention的引入(2014.09) 蒸馏:模型能被学习吗?(2015.03) ResNet: 比深更深(2015.12) Transformer来了!拉开一个时代的序幕(2017.06) AlphaGo Zero: 强化学习的突破(2017.10) 现代MoE的开端(2017.01) CoT: Prompt Engineering的奠基之作(2022.01) LoRA: 那个我们每天都在用的东西(2021.06) ReAct: Agent从理论到落地(2022.10) The Bitter Lesson: 过去70年的教训(2018.08) > **01:52:58 Part 2:Infra与数据的变迁** ZeRO: 大规模的GPU并行计算(2019.10) Scaling Law & Chinchilla: 上帝的指挥棒(2020.01 2022.03) LAION-5B: 开源社区的英雄主义(2022.10) The RefinedWeb: 互联网的数据也很够用(2023.06) MegaScale: 万卡GPU集群的训练(2024.02) > **02:21:29 Part 3:语言模型的发展** Word2Vec: 用机器学习将单词向量化(2013.01) Google Translate: 神经网络的大规模线上部署(2016.09) GPT-1,它来了(2018.06) BERT: 曾经的王(2018.10) GPT-2: 是时候告别微调了(2019.02) GPT-3: ChatGPT来临前夜(2020.05) InstructGPT: 给LLM以文明(2022.03) Tulu 3: 后训练的开源(2024.11) > **03:08:08 Part 4:多模态模型的发展** DeepVideo: 深度学习进入视频领域,Andrej 初出茅庐(2014.06) 双流网络: Karén和学术重镇牛津登场(2014.06) 图像生成的序章: GAN来了(2014.06) Diffusion: 在GAN的阴影下,悄然成长(2015.03) DDPM: Diffusion重回图像舞台的中央(2020.06) ViT: 当图像遇到Transformer(2020.10) CLIP: 文生图的奠基石(2021.03) Stable Diffusion,它来了(2021.12) DiT: 人们期待一个融合的未来(2022.12) > **03:56:38 最后的聊天** 架构抱住了硬件的大腿 今天技术的边界到达了哪? 给“站在AI世界门外张望的人”和“已经在体系中工作多年的人”的建议 【技术之美】系列: [逐句讲解DeepSeek-R1、Kimi K1.5、OpenAI o1技术报告——“最优美的算法最干净”](https://www.xiaoyuzhoufm.com/episodes/67a1b697247d51713c868367) [逐篇讲解DeepSeek关键9篇论文及创新点——“勇敢者的游戏”](https://www.xiaoyuzhoufm.com/episodes/67aacd6b247d51713cedbeda) [逐篇讲解DeepSeek、Kimi、MiniMax注意力机制新论文——“硬件上的暴力美学”](https://www.xiaoyuzhoufm.com/episodes/67bb3696606e5c5940533ef4) [逐篇讲解机器人基座模型和VLA经典论文——“人就是最智能的VLA”](https://www.xiaoyuzhoufm.com/episodes/67f28c6e0decaeb0943fb14a) [逐段讲解Kimi K2报告并对照ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”](https://www.xiaoyuzhoufm.com/episodes/6889da698e06fe8de77116a9) 【更多信息】 本集的投屏视频版已经同步发布于Bilibili(张小珺商业访谈录):https://www.bilibili.com/video/BV1pkyqBxEdB/?spm_id_from=333.1365.list.card_archive.click&vd_source=aa7c66a3d015be4b5bfcd520784f2790 50页完整PPT开源地址(所有论文链接附在PPT上):https://w7py8ou4dk.feishu.cn/wiki/KacewdlmSiSGC9kUOKDch9gwnKf?from=from_copylink
Original title: 117. 开源一段论文探索之旅:模型范式、Infra和数据、语言、多模态的完整变迁史
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是谢青池,他是美团光年之外的产品负责人。</p><p>一个月前,青池找到我,说他用了一年多的时间一篇一篇地啃完了200多篇AI论文,从开始全然不得要领,到后来逐渐地入门——<strong>而他希望将他的论文探索之旅开源给大家。</strong></p><p>就这样,我们有了今天这集特别的节目。</p><p>他从200多篇论文中精选了36篇经典,4小时讲解,带你穿越AI变迁史。</p><p>他说,<strong>读论文是“给你打开一扇门”,让你能直接“与这个世界最聪明的头脑对话”。</strong></p><p>2025年,期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FlXrk-ijMq1TpE6D2peK8gcRwPwy.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>01:30 探索的缘起</p><p>07:25 怎么读论文?(用AI学AI)</p><p>10:20 辅助小工具和路书</p><p><strong>论文讲解的主干:</strong></p><figure><img src="https://image.xyzcdn.net/FlD9g0se44VT3UAGU8lzaGGev8Ik.png" /></figure><blockquote><p><strong>19:35 Part 1:模型的范式变迁</strong></p></blockquote><p>故事要从1999年的第一颗GPU开始讲起</p><p>Brook: 用GPU进行计算 (2004.08)</p><p>AlexNet: 深度学习的开端(2012.10)</p><p>对序列建模:seq2seq和Attention的引入(2014.09)</p><p>蒸馏:模型能被学习吗?(2015.03)</p><p>ResNet: 比深更深(2015.12)</p><p>Transformer来了!拉开一个时代的序幕(2017.06)</p><p>AlphaGo Zero: 强化学习的突破(2017.10)</p><p>现代MoE的开端(2017.01)</p><p>CoT: Prompt Engineering的奠基之作(2022.01)</p><p>LoRA: 那个我们每天都在用的东西(2021.06)</p><p>ReAct: Agent从理论到落地(2022.10)</p><p>The Bitter Lesson: 过去70年的教训(2018.08)</p><blockquote><p><strong>01:52:58 Part 2:Infra与数据的变迁</strong></p></blockquote><p>ZeRO: 大规模的GPU并行计算(2019.10)</p><p>Scaling Law & Chinchilla: 上帝的指挥棒(2020.01 2022.03)</p><p>LAION-5B: 开源社区的英雄主义(2022.10)</p><p>The RefinedWeb: 互联网的数据也很够用(2023.06)</p><p>MegaScale: 万卡GPU集群的训练(2024.02)</p><blockquote><p><strong>02:21:29 Part 3:语言模型的发展</strong></p></blockquote><p>Word2Vec: 用机器学习将单词向量化(2013.01)</p><p>Google Translate: 神经网络的大规模线上部署(2016.09)</p><p>GPT-1,它来了(2018.06)</p><p>BERT: 曾经的王(2018.10)</p><p>GPT-2: 是时候告别微调了(2019.02)</p><p>GPT-3: ChatGPT来临前夜(2020.05)</p><p>InstructGPT: 给LLM以文明(2022.03)</p><p>Tulu 3: 后训练的开源(2024.11)</p><blockquote><p><strong>03:08:08 Part 4:多模态模型的发展</strong></p></blockquote><p>DeepVideo: 深度学习进入视频领域,Andrej 初出茅庐(2014.06)</p><p>双流网络: Karén和学术重镇牛津登场(2014.06)</p><p>图像生成的序章: GAN来了(2014.06)</p><p>Diffusion: 在GAN的阴影下,悄然成长(2015.03)</p><p>DDPM: Diffusion重回图像舞台的中央(2020.06)</p><p>ViT: 当图像遇到Transformer(2020.10)</p><p>CLIP: 文生图的奠基石(2021.03)</p><p>Stable Diffusion,它来了(2021.12)</p><p>DiT: 人们期待一个融合的未来(2022.12)</p><blockquote><p><strong>03:56:38 最后的聊天</strong></p></blockquote><p>架构抱住了硬件的大腿</p><p>今天技术的边界到达了哪?</p><p>给“站在AI世界门外张望的人”和“已经在体系中工作多年的人”的建议</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【技术之美】系列:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67a1b697247d51713c868367" rel="noopener noreferrer nofollow" target="_blank">逐句讲解DeepSeek-R1、Kimi K1.5、OpenAI o1技术报告——“最优美的算法最干净”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67aacd6b247d51713cedbeda" rel="noopener noreferrer nofollow" target="_blank">逐篇讲解DeepSeek关键9篇论文及创新点——“勇敢者的游戏”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67bb3696606e5c5940533ef4" rel="noopener noreferrer nofollow" target="_blank">逐篇讲解DeepSeek、Kimi、MiniMax注意力机制新论文——“硬件上的暴力美学”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67f28c6e0decaeb0943fb14a" rel="noopener noreferrer nofollow" target="_blank">逐篇讲解机器人基座模型和VLA经典论文——“人就是最智能的VLA”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/6889da698e06fe8de77116a9" rel="noopener noreferrer nofollow" target="_blank">逐段讲解Kimi K2报告并对照ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”</a></p><p>【更多信息】</p><p>本集的投屏视频版已经同步发布于Bilibili(张小珺商业访谈录):https://www.bilibili.com/video/BV1pkyqBxEdB/?spm_id_from=333.1365.list.card_archive.click&vd_source=aa7c66a3d015be4b5bfcd520784f2790</p><p>50页完整PPT开源地址(所有论文链接附在PPT上):https://w7py8ou4dk.feishu.cn/wiki/KacewdlmSiSGC9kUOKDch9gwnKf?from=from_copylink</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
吴明辉19年口述史:漫长沉浮、痛苦急转、企业级Agentic模型、现实世界数值游戏、首次公开募股From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-10-09 05:55
今天的嘉宾吴明辉是明略科技的创始人、CEO兼CTO,明略科技于2025年8月29日获境外发行上市备案通知书,不久后将于香港上市。这是一次上市前的访谈,吴明辉口述一家To B公司漫长的19年故事,其间经历了好多次的分分合合、沉浮与急转。你能在这里面找到许多我们节目嘉宾的身影——肖弘、李广密、杨植麟。我们也聊了聊面向全新的AI时代,企业服务级AI与Agentic Model的前景。但这个故事的最开始,要从他与峰瑞资本创始合伙人李丰的公司合并聊起。2025年,我们和AI共同进步!02:11 Part 1:第一段创业 开始的快问快答 和我们嘉宾广密、Red的渊源 创业的开始:祝伟投资吴明辉和李丰合并后的公司 最开始罗永浩、李笑来是我们的股东 第一版商业计划书就是推荐系统,为什么没做今日头条?奥林匹克竞技训练的心理调适 秒针系统的成功 眼睁睁看着今日头条的流量哗啦啦起来 56:08 Part 2:第二段创业 “老板上完商学院,团队遭殃” 同时创立明略科技、云迹机器人 学习美国一家数据分析公司Palantir,但从To G转向To B 收购Red的决策,我希望他做我的CEO successor 2020-2021年:战场开得太宽、走过的弯路 2022年:痛苦的急转,人生最suffer的一年 有AI以后,预计企业级服务会出现并购潮 01:45:01 Part 3:企业服务级AI 基于公开数据训基础模型、以卖Token为商业模式的公司会很卷,卷成电费 有私有Data的公司能产生差异化价值 现实世界的数值游戏 新产品“DeepMiner”的由来 Agent或Tool Use在企业服务领域产生了新的链接 Agent是一种交互技术,对To C和To B互联网都会产生革命性变化 那些不提供供给侧能力、只提供链接网络,而这个网络又不是根结点的公司,会很危险 将来企业只有两类人?老板和合伙人(合伙人不是公司员工) 一个幸福的老板,个人使命、家庭使命和公司使命高度相关
Original title: 116. 吴明辉口述19年史:漫长的沉浮、痛苦急转、企业级Agentic Model、现实世界的数值游戏、IPO
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾吴明辉是明略科技的创始人、CEO兼CTO,明略科技于2025年8月29日获境外发行上市备案通知书,不久后将于香港上市。</p><p><strong>这是一次上市前的访谈,吴明辉口述一家To B公司漫长的19年故事,其间经历了好多次的分分合合、沉浮与急转。</strong>你能在这里面找到许多我们节目嘉宾的身影——肖弘、李广密、杨植麟。</p><p><strong>我们也聊了聊面向全新的AI时代,企业服务级AI与Agentic Model的前景。</strong></p><p>但这个故事的最开始,要从他与峰瑞资本创始合伙人李丰的公司合并聊起。</p><p>2025年,我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FuJEeo_DXMeLhFt9a035wSLARI49.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p>02:11 <strong>Part 1:第一段创业</strong></p></blockquote><p>开始的快问快答</p><p>和我们嘉宾广密、Red的渊源</p><p>创业的开始:祝伟投资吴明辉和李丰合并后的公司</p><p>最开始罗永浩、李笑来是我们的股东</p><p>第一版商业计划书就是推荐系统,为什么没做今日头条?</p><p>奥林匹克竞技训练的心理调适</p><p>秒针系统的成功</p><p>眼睁睁看着今日头条的流量哗啦啦起来</p><blockquote><p>56:08<strong> Part 2:第二段创业</strong></p></blockquote><p>“老板上完商学院,团队遭殃”</p><p>同时创立明略科技、云迹机器人</p><p>学习美国一家数据分析公司Palantir,但从To G转向To B</p><p>收购Red的决策,我希望他做我的CEO successor</p><p>2020-2021年:战场开得太宽、走过的弯路</p><p>2022年:痛苦的急转,人生最suffer的一年</p><p>有AI以后,预计企业级服务会出现并购潮</p><blockquote><p>01:45:01<strong> Part 3:企业服务级AI</strong></p></blockquote><p>基于公开数据训基础模型、以卖Token为商业模式的公司会很卷,卷成电费</p><p>有私有Data的公司能产生差异化价值</p><p>现实世界的数值游戏</p><p>新产品“DeepMiner”的由来</p><p>Agent或Tool Use在企业服务领域产生了新的链接</p><p>Agent是一种交互技术,对To C和To B互联网都会产生革命性变化</p><p>那些不提供供给侧能力、只提供链接网络,而这个网络又不是根结点的公司,会很危险</p><p>将来企业只有两类人?老板和合伙人(合伙人不是公司员工)</p><p>一个幸福的老板,个人使命、家庭使命和公司使命高度相关</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
对OpenAI姚顺雨三小时访谈:六年Agent研究、人与系统、吞噬的边界、既单极又多元的世界。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-09-11 10:03
Today's guest, we are happy to invite OpenAI researcher Yao Shunyu. In April 2025, **Yao Shunyu published a famous blog post "The Second Half"**, announcing that the AI main thread game has entered the second half. After that, we had a podcast conversation with him. Yao Shunyu graduated from Tsinghua and Princeton University, and started researching agents very early. During his PhD, he realized that language may be the closest tool to essence invented by humans, so he turned to language agent research and has been doing it for 6 years. He has many representative works. **Our conversation starts from the individual and jointly explores the boundaries of world intelligence and the panorama of humans and machines, reached by people, organizations, AI, and human-machine interaction.** Not long ago, I just founded a new content studio "Language is World Studio". Shunyu unexpectedly helped me answer the original intention of our studio from another perspective. Why do we believe that language is the essential mystery of this world? His expression is: **"Language is a tool invented by humans to achieve generalization, which is more essential than other things."** (This interview took place in May 2025. The interview represents personal views and is not related to the company where he works.) > **02:58 Part 1: People** > * I feel that the first 28 years of my life were very well-behaved > * I have always had this non-consensus: I want to be an Agent > * The biggest gain in the first year is to use GPT, not BERT; the second learning is that tasks or environment are very important > * My research has two cores: one is how to do some valuable tasks and environments that are more relevant to the real world; the other is how to do some simple but general methods > **17:50 Part 2: System** > * Agent is a very old concept. Any system that can make its own decisions, interact with the environment, and try to optimize rewards can be called an Agent > * Three ups and downs in the evolution of Agent: everyone may pay more attention to the method line and easily ignore the task line, but these two lines are complementary > * The two most critical directions for Agent development: one is to let it have its own reward and be able to explore on its own; the other is Multi-Agent, so that they can form an organizational structure between them > * Code is a bit like a human hand, it is AI's most important *affordance* > * Task setting > * Generalized tools > * Reward mechanism > **48:38 Part 3: Devouring Boundaries** > * The biggest opportunity for startups is: to design different interfaces > * It is possible that the model's capabilities will produce interaction methods beyond ChatGPT and become a Super App > * Owning a Super App is a double-edged sword for a company. When you have a Super App like ChatGPT, naturally your research will revolve around this Super App > * Assistant, Her, or human-like interaction is obviously one of the most important interaction methods; what is not obvious is, can I base it on non-human-like interaction? > * This world is a relationship of mutual copying, not a one-way copying relationship > * OpenAI may become a company similar to Google, becoming a very important part of the new world, but this does not mean that the world will be monopolized by such a unipolar system > * The ultimate intelligent boundary is determined by different interaction methods, not by a single model > * The winter before last, I read a book written by Von Neumann before his death: The Computer and the Brain > * The environment is always the outermost part of the memory hierarchy, which is very philosophical > * The Chatbot system of a model company will evolve into a very natural Agent system > **01:05:01 Part 4: The Global of Humanity** > * Human and System: Should Agent be like a human? "It's a utility problem" > * OpenAI is a bottom-up company > * If you don't have a different bet, it's hard to surpass the previous overlord > * My mentor is the second author of GPT‑1. He stayed at OpenAI for a year, and he was a bit skeptical about this > * If you become the CEO of Berkshire and want to allocate 50 billion US dollars to the AGI industry in the future, how would you allocate this money? > * The real danger is not that something similar to WeChat defeats WeChat, but that something different defeats WeChat > * It happens that in this era, it is better to do things with higher ceilings 【More Information】 Text version launched simultaneously For the text version, please go to the official account: Language is World language is world
Original title: 115. 对OpenAI姚顺雨3小时访谈:6年Agent研究、人与系统、吞噬的边界、既单极又多元的世界
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾,我们很开心邀请了OpenAI研究员姚顺雨。</p><p>2025年4月,<strong>姚顺雨发布了一篇有名的博文《The Second Half》</strong>,宣告AI主线程的游戏已进入下半场。这之后,我们与他进行了一场播客对谈。</p><p>姚顺雨毕业于清华和普林斯顿大学,开始智能体的研究非常早。在博士期间他意识到语言可能是人类发明的最接近本质的工具,于是转向语言智能体研究,至今已6年。他有许多有代表性的工作。</p><p><strong>我们的谈话从个体出发,共同探索由人、组织、AI、人与机器的交互,所抵达的这个世界智能的边界以及人类与机器的全景。</strong></p><p>前不久,我刚刚创立了一家新的内容工作室「语言即世界工作室」,顺雨很意外地从另一个角度帮我回答了,我们工作室创立的初心。</p><p>为什么我们相信语言是这个世界的本质奥秘?他的表达是:<strong>“语言是人为了实现泛化而发明出来的工具,这一点比其他东西更本质。”</strong></p><p>(本次访谈发生在2025年5月,访谈为个人观点,与所供职公司无关。)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FvcGJiysXOtY4fIP_aDHS3_iV-t7.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:58 第一部分:人</strong></blockquote><ul> <li>感觉我前28年的人生,非常的乖</li> <li>我一直有这个非共识:我想要去做Agent</li> <li>第一年最大收获是,要用GPT,不要用BERT;第二个learning是任务或环境非常重要</li> <li>我的研究有两个核心:一是怎么去做一些有价值、和现实世界更相关的任务和环境;二是怎么去做一些简单、但又通用的方法</li></ul><blockquote><strong>17:50 第二部分:系统</strong></blockquote><ul> <li>Agent是一个非常古老的概念,任何能进行自我决策、与环境交互,并试图优化奖励的系统,都可以被称为Agent</li> <li>Agent演变的三波兴衰:大家可能更多注意到方法线,容易忽视任务线,但这两条线是相辅相成的</li> <li>Agent发展最关键的两个方向:一个是让它拥有自己的reward(奖励),能自己探索;另一个是Multi-Agent(多智能体),让它们之间能形成组织结构</li> <li>Code有点像人的手,它是AI最重要的<em>affordance</em>(环境给予行动者的可能性)</li> <li>任务的设定</li> <li>泛化的工具</li> <li>奖励的机制</li></ul><blockquote><strong>48:38 第三部分:吞噬的边界</strong></blockquote><ul> <li>创业公司最大机会是:能设计不同的interface(交互方式)</li> <li>可能模型的能力会产生beyond ChatGPT(超越 ChatGPT)的交互方式,变成Super App</li> <li>拥有一个Super App对于公司是双刃剑,当你有像ChatGPT这样的Super App,很自然你的研究就会围绕这个Super App</li> <li>Assistant、Her,或者像人一样的交互方式,显然是最重要的交互方式之一;不显然的是,我能不能基于不像人的交互方式?</li> <li>这世界是个相互抄的关系,而不是一个单向抄的关系</li> <li>OpenAI可能会成为一个类似Google的公司,成为新世界里非常重要的一环,但这并不代表,这个世界就会被这样一个单极系统垄断</li> <li>最终的智能边界,是由不同的交互方式决定的,而不是由一个single model(单一模型)决定</li> <li>前年冬天,我读到冯诺依曼临终前写的一本书:The Computer and the Brain</li> <li>环境永远是记忆层级中最外层的部分,这很哲学</li> <li>模型公司的Chatbot系统会演化成一个很自然的Agent系统</li></ul><blockquote><strong>01:05:01 第四部分:人类的全局</strong></blockquote><ul> <li>人与系统:Agent要不要像人?“是一个效用问题”</li> <li>OpenAI是一个bottom-up(自下而上)的公司</li> <li>如果你没有一个different bet(不同的下注方向),很难超越前面的霸主</li> <li>我导师是GPT‑1第二作者,他在OpenAI待了一年,他对这件事是有点怀疑的</li> <li>如果你成为了伯克希尔的CEO,未来要拿出500亿美金allocate(分配)到AGI行业,你会怎么allocate这笔钱?</li> <li>真正的危险,不是一个类似微信的东西打败了微信,而是一个不一样的东西打败了微信</li> <li>恰好这个时代,做上限更高的事更好</li></ul><p>【更多信息】</p><p>文字版同步上线</p><p>文字版请前往公众号:语言即世界language is world</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
与殷一、欧迪聊聊萨洛蒙:外资品牌入华、小众越野跑和少女故事From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-09-06 05:06
Today's guests are Yin Yi, General Manager of Salomon China, and Oudi, Head of Fashion and Trend Industry at Xiaohongshu Business. Let's talk about Salomon, a popular outdoor brand in the past two years. Salomon and Arc'teryx both belong to Amer Sports Group. In 2019, Amer Sports was acquired by Anta. **After 2021, Salomon, a 70-year-old French brand, unexpectedly started its growth path in China.** This niche brand, which started with skiing and gradually expanded to trail running shoes, mainly targeted men and professional skiing and trail running enthusiasts in China's traditional consumer groups. **However, in recent years, through a series of brand activities on Xiaohongshu, they have successfully attracted female consumer groups and new outdoor enthusiasts, expanding their circle and growing, which in turn has further stimulated the growth of male consumers and core sports enthusiasts.** I hope this fresh brand knowledge can also bring you new inspiration :) 02:00 Self-introduction of the two guests 03:06 Salomon was born in France in 1947, and snow is the deepest imprint in its DNA 04:39 We were once acquired by Adidas, and Adidas helped us with sports style 06:04 The core group of trail runners was only 100,000 ten years ago, and it is still 100,000 this year. What's the difference? 11:52 Should a brand go from niche to mass, or from mass to niche? 16:22 What happened to Salomon after its parent company Amer Sports was acquired by Anta in 2019? 18:07 The proportion of Chinese women is the highest, nearly 70%, and now it is less than 60% 20:45 The pull of women on new male consumers is higher than the pull of men on new female consumers 23:21 After 2021, more and more overseas outdoor brands actively and centrally enter China 27:31 Xiaohongshu helps Salomon expand its audience: "New Product Tasting" and "Color Sensitive Control" 34:55 A people-oriented brand strategy: find "super user representatives" 43:26 10 years ago, we focused more on the winning moments when building a brand, but now we focus more on the process and details of growth 45:37 Consumer insights behind Salomon girls: women no longer pursue the accumulation of rituals, but pursue inner relaxation 48:36 Combining Xiaohongshu and Anfu Road Salomon stores, online and offline circulating traffic 55:24 Salomon's new female consumers have in turn fueled the growth of male consumers 58:16 If a very masculine brand wants to become feminine, what should it do? 01:00:43 Will trendiness weaken the professional outdoor gene? 01:01:33 New changes in young people's consumption 01:08:05 When building an AI brand like a consumer brand, give some advice to AI founders from a brand perspective Share some beautiful recording scenes:
Original title: 114. 与殷一、欧迪聊聊萨洛蒙:中国意外的增长阀门、小众越野跑与少女故事
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是萨洛蒙中国区总经理殷一和小红书商业服饰潮流行业负责人欧迪。我们一起来聊聊,这两年比较火的一个户外品牌,萨洛蒙。</p><p>萨洛蒙和始祖鸟都属于亚玛芬集团,2019年亚玛芬被安踏收购;<strong>2021年以后,萨洛蒙这个70多岁的法国品牌,意外在中国开启了增长之路。</strong></p><p>这个最早从滑雪品类起步,逐步扩展到越野跑鞋的小众品牌,在中国的传统消费群体以男性以及专业滑雪、越野跑爱好者为主;<strong>但近几年,他们通过在小红书的一系列品牌行为,成功吸引女性消费群体和新户外人群,扩圈增长,而这又进一步反向刺激了男性消费者以及核心运动人群的增长。</strong></p><p>希望这些新鲜的品牌知识,也能给你带来新的启发:)</p><figure><img src="https://image.xyzcdn.net/FpF2Pa7IhWLZZ4V2h-jx0vpA2lO3.JPG" /></figure><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FlXX4H8P1BRETUdBFboiX8lsX2LW.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>02:00 两位嘉宾的自我介绍</p><p>03:06 萨洛蒙1947年诞生于法国,雪是DNA里最深的那道烙印</p><p>04:39 我们曾经被阿迪达斯收购,阿迪达斯帮我们做了sports style</p><p>06:04 越野跑核心人群,十年前只有10万人,今年也是10万人,区别是什么?</p><p>11:52 品牌应该从小众走向大众,还是从大众走向小众?</p><p>16:22 2019年母公司亚玛芬被安踏收购后,萨洛蒙发生了什么?</p><p>18:07 中国女性占比最高接近七成,现在是六成不到</p><p>20:45 女性对男性的拉新高于男性对女性的拉新</p><p>23:21 2021年以后,越来越多海外户外品牌主动地集中进入中国</p><p>27:31 小红书帮萨洛蒙拓展人群:“尖货尝新档”和“色彩敏感控”</p><p>34:55 以人为主体的品牌策略:找到“超级用户代表”</p><p>43:26 10年前我们做品牌会更注重the winning moments,现在更注重成长的过程和细节</p><p>45:37 萨门少女背后的消费者洞察:女性不再追求仪式感的堆叠,更追求内心的松弛</p><p>48:36 结合小红书和安福路萨洛蒙门店,线上和线下循环流量</p><p>55:24 萨洛蒙拉新女性消费者,又反哺了男性消费者的增长</p><p>58:16 如果一个非常男性化的品牌想要女性化,应该怎么做?</p><p>01:00:43 潮流化会不会削弱专业户外基因?</p><p>01:01:33 年轻人消费新变化</p><p>01:08:05 当做AI品牌也像做消费品品牌,从品牌角度给AI创始人一些建议</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>分享一下漂亮的录制现场:</p><figure><img src="https://image.xyzcdn.net/FiR5vV7vFaetkgUKSerD2ObyM9OU.JPG" /></figure><figure><img src="https://image.xyzcdn.net/FluYEhAf3d_kt4XUtAeWF70FwfD9.JPG" /></figure><figure><img src="https://image.xyzcdn.net/Ft6EYq6Tq8wkGBgVlfS53UFAM_xV.JPG" /></figure><figure><img src="https://image.xyzcdn.net/FgxeH7ZoI7n3H51F-ATg_o6TyhqN.JPG" /></figure><figure><img src="https://image.xyzcdn.net/FmOagFuM0Bhs9Bh1ykS6xW1FgXfg.JPG" /></figure><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
与杨植麟时隔一年的对话:K2、Agentic LLM、缸中之脑和“站在无限的开端”From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-08-27 04:21
Today's guest is Yang Zhilin, founder and CEO of Moonshot AI, last on our show ("Business Interview") 1.5 years ago. This past July, the Kimi K2 model was released, drawing widespread attention. K2 is an open-source programming and Agentic large language model based on MoE architecture. Figuratively, the model uses programming to escape the closed "brain in a vat," growing "hands" to manipulate the external digital world. **Today, I talked with Yang Zhilin about K2's R&D, his current technical understanding, and technical judgments.** And, **as a founder, his feelings and thoughts during the past year's public opinion storms and entrepreneurial ups and downs.** 01:49 **An Infinite Mountain** It's like a book I'm reading: The Beginning of Infinity. Maybe one day we'll find this snow mountain has no end, I hope it never ends. But it's still a "brain in a vat": Imagine a fish tank, you put a brain in it, with no connection to the outside world. Whether it's reinforcement learning based on long thinking, or Agent's reinforcement learning, they point to the same thing: test-time scaling. Another interesting trend is that more model companies are now making "one-party Agent products." L1 to L5 are not necessarily serial relationships. Claude bets on this: it doesn't do much in Reasoning, but it does very well in Agent. Only when the model participates in the development process can the real Innovator (L4) stage be unlocked. 24:58 **K2 is K2 (Mount Godwin-Austen)** K2's key points: First, we want it to be a very good base model. We want to maximize the use of every piece of data, the so-called token efficiency - feeding the same amount of data, the "brain" grows more. We will do a lot of Rephrase operations on the data. We are very concerned about the Muon optimizer, which greatly improves token efficiency. Second, we want K2 to have good Agentic capabilities. For Agentic models, the biggest challenge is model generalization. It may be a transformation from a "brain in a vat" to being able to interact with the world, because the most important feature of an Agent is that it can use tools in multiple rounds. Humans are the so-called universal constructor. There is a potential idea that AI needs to be trained in a more AI native way. Muon will explode when you train it. 54:08 **A Simple and Complex System** Why did Kimi switch from closed source to open source? The model training is complete, and the product is basically complete. Making interaction improvements is certainly valuable, but that's the icing on the cake. It's already good if multi-modality doesn't damage the "brain." The multi-modality you learn may be a "dumb multi-modality," we want it to be a "smart multi-modality." Scaling Law has encountered a data wall, which is an objective fact. The data flywheel is very dependent on external feedback. We don't want the feedback to have a lot of noise, but we haven't solved this problem very well now. It now seems that scaling based on FLOPs is a more effective path, but when will this balance change? Many Long Context architectures affect "intelligence." Pure Linear Attention may affect intelligence because this architecture has some bias. Where are the long-term boundaries between base model companies and application companies that make Agent products? How to think about the business model today? Is API a good business? Can Kimi make money? 01:25:05 **In Your Own Story** Tim (Zhou Xinyu) tells me every day - manage with RL, not SFT. The biggest problem with managing a team with RL is that you are easily hacked. A lot of complexity is artificially added, it's not that complicated in reality. You can only say that you are in your own story - you constantly feel what kind of person you are, why you want to do this thing. I also asked Kimi this question, and he said that AI is the "amplifier of human civilization." This is also what Kimi told me - any intermediate state may become the object of criticism. There is definitely fear, but it's more important to focus on what you can do in the current step. - Thinking about this question is more important. 2024 Interview with Yang Zhilin: 《Chatting with Yang Zhilin about the past year of large model entrepreneurship: the increment of human ideals, probabilistic non-consensus, and Sora》 [More Information] Text and video versions are launched simultaneously. For the text version, please go to the official account: language is world For the video version, please go to Bilibili: Zhang Xiaojun Business Interview
Original title: 113. 和杨植麟时隔1年的对话:K2、Agentic LLM、缸中之脑和“站在无限的开端”
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是月之暗面创始人兼CEO杨植麟,距离他上一次来我们的节目(《商业访谈录》59集)已经过去1年半。</p><p>就在刚刚过去的7月,Kimi K2模型发布,引发了比较广泛的关注。K2是一个基于MoE架构的开源编程和Agentic大语言模型。形象来说,模型借助编程能力走出封闭的“缸中之脑”,长出了“手”,开始操控外部数字世界。</p><p><strong>今天这集节目我和杨植麟聊了聊K2的研发和他当下的技术认知、技术判断。</strong></p><p>以及,<strong>在过去一年的舆论风暴与创业起伏中,作为创始人,他的心情与思考。</strong></p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fm-P6G2K2Gz6s45f9Xb_NdNasujW.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote>01:49 <strong>一座无限的山</strong></blockquote><p>这有点像我最近在看的一本书:The Beginning of Infinity(无穷的开始)</p><p>也许有一天会发现,这座雪山没有尽头,我希望它一直没有尽头</p><p>但它还是一个“缸中之脑”:想象一个鱼缸,你把一个脑子放在里面,跟外界没有联系</p><p>不管是基于长思考的强化学习,还是Agent的强化学习,都指向同一个东西:test-time scaling(测试时扩展)</p><p>还有一个很有意思的趋势是,现在有更多模型公司去做“一方的Agent产品”</p><p>L1到L5不一定是串行关系,Claude就bet这一点:它在Reasoning上做得不是特别多,但在Agent上做得非常好</p><p>只有当模型参与到开发过程,才能解锁真正的Innovator(L4)阶段</p><blockquote>24:58 <strong>K2是乔戈里峰</strong></blockquote><p>K2的重点有几个:一,我们希望它是一个非常好的基础模型</p><p>我们希望能最大化使用每一份数据,就是所谓token efficiency——喂一样多的数据,“脑子”长得更多</p><p>我们会对数据做很多Rephrase(改写)操作</p><p>我们很关注Muon优化器,它对token efficiency提升很大</p><p>二,我们希望K2有好的Agentic能力,对于Agentic模型来讲,最大挑战是模型的泛化</p><p>它可能是一个从“缸中之脑”变成可以跟世界交互,因为所谓Agent最重要的特征是,可以多轮地使用工具</p><p>人是所谓的universal constructor(万能构造器)</p><p>有一种潜在思路,需要用更AI native(原生人工智能)的方式去训练AI</p><p>Muon你去训的时候,它会炸</p><blockquote>54:08 <strong>既简单又复杂的系统</strong></blockquote><p>为什么Kimi从闭源转向开源?</p><p>模型训练完成,产品也基本完成了,做交互上的改进当然有价值,但那是锦上添花的一步</p><p>多模态不损伤“脑子”已经很好了</p><p>你可能学出来的多模态是个“傻的多模态”,我们希望它是个“聪明的多模态”</p><p>Scaling Law遇到数据墙了,这是客观事实</p><p>数据飞轮很依赖外部环境的feedback(反馈),我们不希望feedback有很多噪声,但现在没有把这个问题解决得非常好</p><p>现在看起来,基于FLOPs的scaling是更有效路径,但这个平衡什么时候会发生变化?</p><p>很多Long Context架构会影响“智商”</p><p>纯粹的Linear Attention(线性注意力机制)可能影响智商,因为这个架构会有一些bias(偏差)</p><p>基座模型公司和做Agent产品的应用公司,长期看边界在哪?</p><p>今天怎么思考商业模式?API是好生意吗?</p><p>Kimi能赚钱吗?</p><blockquote>01:25:05 <strong>在自己的故事里面</strong></blockquote><p>Tim(周昕宇)天天跟我讲——要用RL的方式去管理,而不是用SFT</p><p>用RL管理团队最大问题是,你容易被hack</p><p>很多复杂性都是人为强行加上去的,实际并没有那么复杂</p><p>只能说是在自己的这个故事里面——你不断地感受自己到底是什么样的一个人,你为什么要做这个事情</p><p>这个问题我也问过Kimi,他说,AI是“人类文明的放大器”</p><p>这也是Kimi跟我讲的——任何中间状态都有可能成为被批评的对象</p><p>肯定有恐惧,更多要关注你当前这一步,能做什么?——想这个问题更重要</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>2024年对杨植麟的访谈:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/65e16b5b6144a933b1d968b5">《和杨植麟聊大模型创业这一年:人类理想的增量、有概率的非共识和Sora》</a></p><p>【更多信息】</p><p>文字和视频版同步上线</p><p>文字版请前往公众号:语言即世界language is world</p><p>视频版请前往Bilibili:张小珺商业访谈录</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
112. He Guangmi Large Model Quarterly Report: Differentiation and Convergence, All-in-One and Vertical Integration, L4 Experience and Mining WindowFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-08-18 23:00
The new episode of the "Global Large Model Quarterly Report" is finally here, under everyone's strong urging. This episode has two keywords. **The first keyword is differentiation.** In this quarter, various model companies in Silicon Valley began to differentiate into various fields. Apart from Google Gemini and OpenAI still making general-purpose models, Anthropic differentiated into Coding and Agentic model capabilities; Mira's Thinking Machines differentiated into multi-modality and next-generation interaction. **The second keyword is product.** In the past, the "Large Model Quarterly Report" has focused on the intelligent exploration of models, but Guangmi has begun to discuss products in detail, which is a first. This is the 7th episode of the "Global Large Model Quarterly Report". If you like our series, we hope you will give us more encouragement and support. **Your praise is very important to us.** 2025, looking forward to our common progress with AI! > **03:54 Model Differentiation** General-purpose models with various capabilities - Gemini/OpenAI All in Coding+Agentic capabilities - Anthropic Multi-modal native - Thinking Machines Lab Grok is still exploring its ecological position today Meta's original 0-1 gene is still very weak The most leading companies are very similar to F1 competitions > **21:37 Horizontal Full-Family Bucket, Vertical Integration** C-end is a very obvious head convergence trend, ChatGPT may converge many products at the C-end As an investor or AI entrepreneur, one side is excited that the technology is progressing every month, and the other side is a bit desperate An example of a horizontal full-family bucket is ChatGPT, which already includes Chat+Search+Coding+Agent+WorkSpace An example of vertical integration is Gemini, from TPU chips, to Gemini models, to Agent applications on top, to Google Docs/Chrome browser/Android operating system/YouTube videos, which can do super integration > **33:35 Intelligence and Product are Both Important** In the past 3 years, we have been extremely obsessed with exploring the upper limit of intelligence, but in the past two months, we have begun to pay attention to products ChatGPT has many non-technical barriers, while Coding or model companies only have technical barriers OpenAI is the best balanced company, exploring the upper limit of intelligence while also transforming intelligent dividends into product traffic and brand awareness > **38:52 Making AI Products is Like Mining, the Preservation Window is Key** Mining: The first experience that amazes users is very important, even if the token consumption is very large, as long as you are the first to create Magic moments that amaze users, it is equivalent to you have received at least 500 million US dollars in marketing expenses, such as Perplexity/Cursor/Manus But this window period is particularly interesting, the window is gradually shortening: from 2 years, 1 year, 3 months Can product companies win over products made by model companies? > **44:21 L4 Level Experience** The two best Agents both have L4 experience: ChatGPT's Deep Research + Anthropic's Claude Code, corresponding to information search + software development respectively Today's biggest dividend is still language/code dividend, especially code, not multi-modal/world model/robot Claude Code has been killing it recently, Claude Code is an L4 experience What other areas will have L4 level experience next? > **52:43 Change in View of Google** One guess is that ChatGPT will definitely make an advertising platform in the future, because it recently hired a new commercialization CEO But I think Google is still the best advertising platform in the world. In the end, everyone's product forms will converge to the same goal, and what is integrated together is the full-family bucket logic, and Search will also evolve > **55:53 Other Topics** Is there a bubble in AGI? If there is a bubble in AGI, what will be the trigger to burst the bubble? What is the difference in intelligence level between humans and gorillas? What new topics have been discussed more recently in the Bay Area? **"Jewish finance, Chinese AGI"** [Global Large Model Quarterly Report] Series 2023: Oral Global Large Model This Year: Human Billion-Dollar Scientific Gamble and Uneven Sino-US Landscape 2024 Q1: Chatting with Guangmi about the AGI Large Infrastructure Era: Electricity + Chip = Output Intelligence 2024 Q2: Oral Global Large Model This Half Year: Perplexity's Sudden Popularity and the AI Application Ecosystem That Has Not Yet Exploded 2024 Q3: AGI Paradigm Shift: Predicting Strawberries, OpenAI o1 and Self-Play RL with Guangmi 2024 Q4: Large Model Quarterly Report Year-End Special: Predicting the Way LLM Products Surpass Google with Guangmi 2025 Q1: Large Model Quarterly Report: Chatting with Guangmi about the Biggest Non-Consensus at the Moment, the Main Line and Main Peak of AGI
Original title: 112. 和广密聊大模型季报:分化与收敛、全家桶与垂直整合、L4体验与挖矿窗口
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>在大家的强烈催更下,新一集的《全球大模型季报》终于来了。</p><p>这一集有两个关键词。</p><p><strong>第一个关键词是分化。</strong>硅谷各个模型公司在这个季度,开始分化到各个领域,除了Google Gemini和OpenAI还在做通用的模型;Anthropic分化到Coding、Agentic的模型能力;Mira的Thinking Machines分化到多模态和下一代交互。</p><p><strong>第二个关键词是产品。</strong>《大模型季报》过去一直把视角放在模型的智能探索上,而广密开始浓墨重彩地聊产品,这还是第一次。</p><p>这里是《全球大模型季报》的第7集,如果大家喜欢我们的系列,希望大家多多给我们一些鼓励和支持。<strong>你们的夸奖对我们来说,非常的重要。</strong></p><p>2025,期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FhO5L2PKPE8OSwyJDGz5_NmS_Zfh.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p><strong>03:54 模型在分化</strong></p></blockquote><p>通用各项能力的模型 - Gemini/OpenAI</p><p>All in Coding+Agentic 能力 - Anthropic</p><p>多模态原生 - Thinking Machines Lab</p><p>Grok 今天还在摸索自己生态位置</p><p>Meta 原创 0-1 的基因还是很弱</p><p>最领先的这几家很像 F1 竞赛</p><blockquote><p><strong>21:37 横向全家桶,纵向垂直整合</strong></p></blockquote><p>C端是一个非常明显的头部收敛趋势,ChatGPT可能在C端会收敛掉很多产品</p><p>作为投资人或 AI 创业者,一面兴奋是技术每个月都在进步,另一面有点绝望</p><p>横向全家桶的例子是ChatGPT,已经包含了Chat+搜索+Coding+Agent+WorkSpace</p><p>纵向垂直整合的例子是 Gemini,从 TPU 芯片,到 Gemini 模型,到上面 Agent 应用,再到 Google 文档/Chrome浏览器/安卓操作系统/YouTube视频,可以做超级集成</p><blockquote><p><strong>33:35 智能和产品都重要</strong></p></blockquote><p>过去 3 年一直是对智能上限的探索极度上头,但在过去两个月开始重视产品了</p><p>ChatGPT 身上有很多非技术性壁垒,而 Coding 或模型公司只是技术壁垒</p><p>OpenAI 是平衡最好的一家,一边探索智能上限,一边又把智能红利转化成产品流量和品牌心智</p><blockquote><p><strong>38:52 做 AI 产品很像挖矿,保鲜窗口很关键</strong></p></blockquote><p>挖矿:第一个做出来让用户惊叹的体验很重要,哪怕 token 消耗很大,只要你是第一个做出来让用户惊叹的 Magic moments,就等于你起码得到了 5 亿美金的营销费用,比如 Perplexity/Cursor/Manus</p><p>但这个窗口期又特别有意思,窗口是逐渐在缩短的:从 2 年、1 年、3 个月</p><p>产品公司能赢过模型公司做的产品吗?</p><blockquote><p><strong>44:21 L4 级别的体验</strong></p></blockquote><p>最优秀的俩 Agent 都有了 L4 体验:ChatGPT 的 Deep Research + Anthropic 的 Claude Code,分别对应信息搜索+软件开发</p><p>今天最大红利还是 language/code 红利,尤其是 code,还不是多模态/世界模型/机器人</p><p>Claude Code 最近大杀四方,Claude Code 是一个 L4 的体验</p><p>接下来还有哪些领域能有 L4 级别体验?</p><blockquote><p><strong>52:43 对Google看法的转变</strong></p></blockquote><p>一个猜想是,ChatGPT 后面肯定会做广告平台,因为最近招了新的商业化 CEO</p><p>但我在想 Google 还是全球最好的广告平台,最后大家产品形态上都会殊途同归,融合到一起的,就是全家桶逻辑,Search 也会演变</p><blockquote><p><strong>55:53 其他话题</strong></p></blockquote><p>AGI有泡沫吗?假如AGI有泡沫,什么事情会是导火索,戳破泡沫?</p><p>人类和大猩猩的智能水平差异在哪?</p><p>最近湾区有没有什么新的讨论比较高的话题?</p><p><strong>“犹太人的金融,华人的AGI”</strong></p><p><strong>(免责声明:本节目不构成投资建议)</strong></p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【全球大模型季报】系列</p><p>2023年:<a href="https://www.xiaoyuzhoufm.com/episodes/65910adb991e2ee60880f151" rel="noopener noreferrer nofollow" target="_blank">口述全球大模型这一年:人类千亿科学豪赌与参差的中美景观</a></p><p>2024年Q1:<a href="https://www.xiaoyuzhoufm.com/episodes/661f21075dae7932c6f821d8" rel="noopener noreferrer nofollow" target="_blank">和广密聊AGI大基建时代:电+芯片=产出智能</a></p><p>2024年Q2:<a href="https://www.xiaoyuzhoufm.com/episodes/667774b3b6a84127299efd5a" rel="noopener noreferrer nofollow" target="_blank">口述全球大模型这半年:Perplexity突然火爆和尚未爆发的AI应用生态</a></p><p>2024年Q3:<a href="https://www.xiaoyuzhoufm.com/episodes/66d866f0f39a2201c069dccb" rel="noopener noreferrer nofollow" target="_blank">AGI范式大转移:和广密预言草莓、OpenAI o1和self-play RL</a></p><p>2024年Q4:<a href="https://www.xiaoyuzhoufm.com/episodes/6766a52a15a5fd520e6c86a9" rel="noopener noreferrer nofollow" target="_blank">大模型季报年终特辑:和广密预言LLM产品超越Google之路</a></p><p>2025年Q1:<a href="https://www.xiaoyuzhoufm.com/episodes/67e9614b8eecdbeb601ac5fe" rel="noopener noreferrer nofollow" target="_blank">大模型季报:和广密聊当下最大非共识、AGI的主线与主峰</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
李一帆口述激光雷达十一年创业史:你仔细想想行业的机会来自哪里?是国家、民族的机会。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-08-07 23:00
Over the past 10 years, China's new energy vehicle industry has grown from scratch and experienced vigorous development. The most familiar brands may be Ideal, Xpeng, and Nio, but on the other hand, the industrial chain companies behind this transformation are also changing. Episode 108 of "Business Interviews" and this episode's 3-hour interview with Li Yifan, co-founder and CEO of Hesai, **focuses on the invisible players in the automotive industry chain.** This episode is also Li Yifan's oral history of their 11 years of hardcore technology entrepreneurship in LiDAR. As China's technological innovation shifts from internet-based model innovation to cutting-edge hardcore technology innovation, China may see more technology-based entrepreneurs. Hesai's story may provide a reference sample. (This interview was recorded in April 2025) 00:02:00 Quick questions and answers begin 00:02:33 Stock price rollercoaster 00:03:40 99.5% cost reduction of LiDAR 00:12:05 Family and growth 00:32:13 Rare equal shareholding by 3 people 00:43:35 Financing tricks 00:49:02 First large order of 20 million 00:55:45 Thought it was over... 01:10:06 Yu Kai added a 0 more than me 01:20:47 Pricing considerations 01:38:15 Starting to defect 01:58:07 Entering the automotive headquarters 02:38:34 New money and old money 03:02:16 Final quick questions and answers [From Steam Engine to Autonomous Driving] Series 3-Hour Interview with Li Xiang (Podcast Version): Otaku, AI, Family, Games, and Ladder Chatting with He Xiaopeng, FSD, "Swimming in a Sea of Blood", Heroes and Dogs in Troubled Times Dialogue with Ola Källenius, Global CEO of Mercedes-Benz: A CEO in Transition and a 139-Year-Old Mercedes-Benz in Transition Yu Kai's 30-Year Oral History: The World is More Than Swords and Shadows, It's a Jianghu Story of People Coming and Going Chatting with Lou Tiancheng about Robotaxi and ACRush: "The Better L2 is done, the further away from L4"
Original title: 111. 李一帆口述激光雷达11年创业史:你仔细想行业的机会来自哪?是国家、民族的机会
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>过去10年,中国新能源汽车产业从无到有,经历蓬勃发展。大家最熟悉的可能是理想、小鹏、蔚来这些整车品牌,但另一面这场变革背后的产业链企业也在变化。</p><p>《商业访谈录》的108集对余凯和本集对禾赛联合创始人和CEO李一帆的3小时访谈,<strong>关注的都是汽车产业链上的隐形选手。</strong></p><p><strong>这集也是李一帆对他们做激光雷达11年硬核科技创业的一部口述史。</strong></p><p>随着中国科技创新从互联网的模式创新,走向硬核科技的前沿创新,中国也许还会出现更多的技术型创业者。禾赛的故事也许能提供一个参考样本。</p><p>(本次访谈录制于2025年4月)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FpUcziKADuUvw4xnFBg4eruuRPSY.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>00:02:00 开始的快问快答</p><p>00:02:33 股价过山车</p><p>00:03:40 激光雷达99.5%的降本</p><p>00:12:05 家庭和成长</p><p>00:32:13 罕见的3人平分股份</p><p>00:43:35 融资的伎俩</p><p>00:49:02 第一笔2000万大单</p><p>00:55:45 想说完蛋了…</p><p>01:10:06 余凯比多我一个0</p><p>01:20:47 定价心思</p><p>01:38:15 开始倒戈</p><p>01:58:07 进入汽车大本营</p><p>02:38:34 新钱和老钱</p><p>03:02:16 最后的快问快答</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【从蒸汽机到无人驾驶】系列</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67769bd815a5fd520e8fa318">《对李想的3小时访谈(播客版):宅男、AI、家庭、游戏和天梯》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/6695032837236c546e4c2e0f">《和何小鹏聊,FSD、“在血海游泳”、乱世中的英雄与狗熊》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68300e93fcbc2e206b58eb2b">《对话奔驰全球CEO康林松:转型期CEO和转型之中的139岁奔驰》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/686b8c0560f8f77d404338cd">《余凯口述30年史:世界不止刀光剑影,是一部人来人往的江湖故事》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/66bdb98233591c27be49e931">《和楼天城聊聊Robotaxi和ACRush:“L2做得越厉害,离L4越远”》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
逐段解读Kimi K2报告,对比ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-07-30 08:42
**We're reading papers again!!!** Today we're reading some of the most noteworthy technical reports from recent weeks: the technical reports for **Kimi K2, ChatGPT Agent, Qwen3-Coder, and a blog post from Manus.** They are related because all these contents are related to Agent. Today's guest is Zheng Boyuan, a Ph.D. student at The Ohio State University, whose research direction is Language Agent. He will lead us to read the above technical reports and blog posts together. This is the **"Beauty of Technology" series** of "Business Interview Record". We look forward to reading papers with you, appreciating technological equality, and experiencing the beauty of technology - being your cyber group meeting :) 00:02:00 Defining and classifying Agent 00:14:50 Comparison of technical routes of Kimi K2, ChatGPT Agent, Qwen3-Coder, and Manus 00:19:05 Why is there overall disappointment with ChatGPT Agent? 00:28:29 Key aspects of Agent Training: synthetic data, reinforcement learning, safety 00:30:57 **First technical report: Kimi K2: Open Agentic Intelligence** [github.com](https://github.com/MoonshotAI/Kimi-K2/blob/main/tech_report.pdf) 00:43:50 **Second technical report and interview: Introducing ChatGPT agent: bridging research and action** [openai.com](https://openai.com/zh-Hans-CN/index/introducing-chatgpt-agent/) **Sequoia Interview OpenAI: OpenAI Just Released ChatGPT Agent, Its Most Powerful Agent Yet** [www.sequoiacap.com](https://www.sequoiacap.com/podcast/training-data-chatgpt-agent/) 01:53:38 **Third technical report: Qwen3-Coder: Agentic Coding in the World** [qwenlm.github.io](https://qwenlm.github.io/blog/qwen3-coder/) 01:59:04 **Fourth technical blog post: Context Engineering for AI Agents: Lessons from Building Manus (Author: Yichao 'Peak' Ji)** [manus.im](https://manus.im/zh-cn/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus) 02:06:06 Outlook: Maybe there will be a new paradigm 02:15:20 I feel that Agent is "my extended brain", and I have a "legion" (Family of Agents) behind me 02:16:41 Different Bot language styles: DeepSeek is foul-mouthed, Yuanbao is a bootlicker > **Agent Definition** An Agent is an intelligent system capable of interacting with the environment. It has two basic capabilities: **Perception** Ability to observe the state of the environment, including obtaining external information, reading feedback signals, and parsing context. **Action** Ability to perform actions in the environment, such as calling tools, generating output, controlling interfaces, and modifying variables. In short, Agent = Perception + Action Continuously perform the "observe → decide → act" process in a loop to achieve task goals. > **Definition and Classification of Agents** **1. Coding Agent** Representative products: Cursor, Windsurf Features: Strong code generation and editing capabilities, excellent user experience Application scenarios: Code completion, code refactoring, multi-person collaborative programming **2. Search Agent** Features: Combined with search engines, automatically completes information retrieval and aggregation Application scenarios: Market research, report generation, competitor analysis, etc. Potential: Has strong application value in enterprise-level scenarios **3. Tool-Use Agent** Features: Able to call a variety of external tools to complete complex tasks Focus of application: It is the main direction of current Agent research and implementation Example: ReAct (Reasoning + Action) type Agent, which executes tasks through tool calling **4. Computer Use Agent** Representative products: OpenAI Operator, Claude's Computer Use Features: Simulates humans using computers to complete complex operations across applications Application scenarios: Execution process automation, remote assistant, office agent > **Comparison of Agent Technical Routes** **1. In-Context Learning** Features: Relies on powerful pre-trained models, and task planning and execution are achieved through prompt construction Advantages: No fine-tuning, high flexibility Limitations: Weak generalization ability, limited rollout length, easy to get out of control **2. End-to-End Training** Features: Encodes all Agent behaviors into model weights Advantages: Stable reasoning, strong controllability Limitations: High training cost, complex environment construction > **Key Aspects of Agent Training** **1. Data Synthesis** Method: Generate a large number of high-quality trajectories Purpose: Training Agent on how to make decisions, call tools, and manage memory in tasks **2. Reinforcement Learning** Conditions: Requires clearly defined tasks and verifiable rewards Challenges: Task difficulty and environmental feedback design directly affect the quality of Agent behavior **3. Safety Issues** Risks: Agent has autonomous decision-making ability and is prone to misuse tools and deviate from the track Countermeasures: Add sandbox restrictions, behavior constraint mechanisms, Human-in-the-loop > **Outlook: Maybe there will be a new paradigm** The core of generating data will shift from input-output data annotation to building an environment and corresponding task-reward. For example, Scale AI proposed rubrics as reward. Can Agent achieve self-improvement? On the one hand, Agent will continuously obtain new data in the process of interacting with the environment; can it find or construct verifiable rewards by itself? Can the experience accumulated in the interaction be used more effectively?
Original title: 110. 逐段讲解Kimi K2报告并对照ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p><strong>我们又来读论文啦!!!</strong></p><p>今天我们要读的论文是最近几个星期内最值得品读的几篇技术报告,分别是:<strong>Kimi K2、ChatGPT Agent、Qwen3-Coder的技术报告,以及Manus的一篇技术博文。</strong>他们的相关性是,这几篇内容都和Agent有关系。</p><p>今天的嘉宾是俄亥俄州立大学(The Ohio State University)的在读博士郑博元,他的研究方向是Language Agent,他会带我们一起读上述技术报告和博文。</p><p>这是《商业访谈录》的<strong>“技术之美”系列</strong>,期待和你一起读论文,领略科技平权,感受技术之美——做你的赛博组会:)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FgHNmAFclRglFbm9XogKflmG_D-w.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>00:02:00 给Agent下定义和分类</p><p>00:14:50 Kimi K2、ChatGPT Agent、Qwen3-Coder、Manus的技术路线对比</p><p>00:28:29 Agent Training 的关键环节:合成数据、强化学习、安全</p><p>00:30:57 <strong>第一篇技术报告:Kimi K2: Open Agentic Intelligence</strong></p><p><a href="https://github.com/MoonshotAI/Kimi-K2/blob/main/tech_report.pdf" rel="noopener noreferrer nofollow" target="_blank">github.com</a></p><p>00:43:50 <strong>第二篇技术报告和访谈:Introducing ChatGPT agent: bridging research and action</strong></p><p><a href="https://openai.com/zh-Hans-CN/index/introducing-chatgpt-agent/" rel="noopener noreferrer nofollow" target="_blank">openai.com</a></p><p><strong>红杉访谈OpenAI:OpenAI Just Released ChatGPT Agent, Its Most Powerful Agent Yet</strong></p><p><a href="https://www.sequoiacap.com/podcast/training-data-chatgpt-agent/" rel="noopener noreferrer nofollow" target="_blank">www.sequoiacap.com</a></p><p>01:53:38 <strong>第三篇技术报告:Qwen3-Coder: Agentic Coding in the World</strong></p><p><a href="https://qwenlm.github.io/blog/qwen3-coder/" rel="noopener noreferrer nofollow" target="_blank">qwenlm.github.io</a></p><p>01:59:04 <strong>第四篇技术博文:AI代理的上下文工程:构建Manus的经验教训(作者:Yichao 'Peak' Ji)</strong></p><p><a href="https://manus.im/zh-cn/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus" rel="noopener noreferrer nofollow" target="_blank">manus.im</a></p><p>02:06:06 展望:也许会有一个新的范式</p><p>02:15:20 我感觉Agent是“我拓展的大脑”,我背后有一个“军团”(Family of Agents)</p><p>02:16:41 不同Bot的语言风格:DeepSeek嘴臭,元宝舔狗</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><blockquote><p><strong>智能体定义</strong></p></blockquote><p>Agent是一种能够与环境进行交互(interaction)的智能系统。</p><p>它具备两个基本能力:</p><p><strong>感知能力(Perception)</strong><br />能够观察环境的状态,包括获取外部信息、读取反馈信号、解析上下文等。</p><p><strong>行动能力(Action)</strong><br />能够在环境中执行动作,例如调用工具、生成输出、控制界面、修改变量等。</p><p>简言之,Agent = 感知 + 行动<br />在一个循环中不断执行“观察 → 决策 → 行动”的流程,以达成任务目标。</p><blockquote><p><strong>Agent 的定义与分类</strong></p></blockquote><p><strong>1. Coding Agent(代码智能体)</strong><br />代表产品:Cursor、Windsurf<br />特点:代码生成与编辑能力强,用户体验优秀<br />应用场景:代码补全、代码重构、多人协作编程</p><p><strong>2. Search Agent(搜索型智能体)</strong><br />特点:结合搜索引擎,自动完成信息检索和汇总<br />应用场景:市场调研、报告生成、竞争对手分析等<br />潜力:在企业级场景中有很强的应用价值</p><p><strong>3. Tool-Use Agent(工具使用型智能体)</strong><br />特点:能够调用多种外部工具完成复杂任务<br />应用重点:是目前 Agent 研究和落地的主要方向<br />举例:ReAct(推理 + 行动)类 Agent,通过 tool calling 执行任务</p><p><strong>4. Computer Use Agent(电脑操作型智能体)</strong><br />代表产品:OpenAI Operator、Claude 的 Computer Use<br />特点:模拟人类使用电脑,完成跨应用的复杂操作<br />应用场景:执行流程自动化、远程助理、办公代理</p><blockquote><p><strong>Agent 的技术路线对比</strong></p></blockquote><p><strong>1. In-Context Learning(上下文学习)</strong><br />特点:依赖强大的预训练模型,通过提示构造实现任务规划与执行<br />优势:无需微调,灵活性高<br />局限:泛化能力弱,rollout 长度有限,容易失控</p><p><strong>2. End-to-End Training(端到端训练)</strong><br />特点:将 Agent 的全部行为编码进模型权重<br />优势:推理稳定,可控性强<br />局限:训练成本高,环境构建复杂</p><blockquote><p><strong>Agent Training 的关键环节</strong></p></blockquote><p><strong>1. Data Synthesis(数据合成)</strong><br />方法:生成大量高质量的 trajectory(行动轨迹)<br />用途:训练 Agent 在任务中如何决策、调用工具、管理 memory(记忆)</p><p><strong>2. Reinforcement Learning(强化学习)</strong><br />条件:需要定义清晰的 task(任务)与 verifiable reward(可验证奖励)<br />挑战:任务难度与环境反馈设计直接影响 Agent 的行为质量</p><p><strong>3. Safety(安全性)问题</strong><br />风险:Agent 具备自主决策能力,容易误用工具、走偏轨迹<br />对策:加入 sandbox(沙盒)限制、行为约束机制、Human-in-the-loop(人类监控)</p><blockquote><p><strong>展望:也许会有一个新的范式</strong></p></blockquote><p>生成数据的核心会从 input-output 式的数据标注,转向构建 environment(环境)以及对应的 task-reward(任务-奖励)。比如 Scale AI 提出的 rubrics as reward(用评分标准作为奖励机制)</p><p>Agent 能不能实现自我提升(self-improve)?一方面,Agent 在和环境交互的过程中会不断获得新数据;那它能不能自己找到或构造 verifiable reward(可验证的奖励)?交互中积累的 experience(经验),能不能被更有效地利用起来?</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>