-
121. An interview with Tan Jie of DeepMind: Robotics, cross-ontology, World Models, Gemini Robotics 1.5, and GoogleFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-11-28 09:17
The guest today is **Tan Jie, Senior Research Scientist and Tech Lead at Google DeepMind Robotics Team**. His research focuses on applying foundation models and deep reinforcement learning methods to the field of robotics. There have always been two narratives in the field of robotics between China and the US: the market generally believes that China develops faster in hardware, while the US leads in robotic brain design. **In this episode, Tan Jie will offer us a glimpse into the cutting-edge narrative of robotics from a Silicon Valley perspective, especially that of Google DeepMind.** Not long ago, they just released their new work "Gemini Robotics 1.5 brings AI agents into the physical world," and we also discussed their latest findings. Due to the guest's work environment, there will be a certain degree of mixed Chinese and English, and we ask for everyone's understanding and support. > **02:00 Robotics is doing graphics in the real world; graphics is doing robotics in simulation.** Guest's brief biography: Liked playing games as a child, pursued a Ph.D. in computer graphics. The transition from graphics to robotics. My first paper at Google, "Sim-to-Real: Learning Agile Locomotion For Quadruped Robots," pioneered the application of reinforcement learning and sim-to-real in legged robots. Paradigm Shift: The first in the past decade was reinforcement learning, the second was large language models. The impact of large language models on robotics (large language models are like the cerebrum, reinforcement learning is like the cerebellum). > **13:06 Is the robotics foundation model truly a very independent discipline? So far, not yet.** What stage has robotics development reached today? It's not an exaggeration for a decade to pass from a demo to actual implementation. From my perspective, I have to admit that the development of robotics intelligence in recent years has mainly relied on multimodal large models. But what do multimodal models lack? They lack the output of robot actions. When you truly have a generalist model, specialized models simply cannot compete with it. > **23:44 The biggest problem in Robotics is data; it's in a very complex unstructured environment where anything can happen.** The biggest problem is still data. But robotics operates in a very complex unstructured environment where anything can happen. It requires an extremely large amount of very diverse data, but such data does not currently exist. There are many startups now called "data factory." What does the so-called "data pyramid" include? > **27:52 Gemini Robotics 1.5: We have a method called motion transfer, which is our unique secret.** What are the most important discoveries of Gemini Robotics 1.5? First, we incorporated "thinking" into the VLA model. The second very important breakthrough is cross-embodiment transfer. In the Gemini Robotics 1.5 work, we made a distinction between fast and slow models. It should be a transitional approach, as it is currently constrained by computational power and model size. When you want a unify model, it must be very large. Motion Transfer? It's very secret. > **47:32 Generating a huge amount of simulated data is an important means to compensate for its shortcomings.** One point we attach great importance to is data, data, data. Teleoperation is very difficult data to acquire. We will put more effort into using, for example, simulation data, human video, data from YouTube, and even model-generated data, such as some data generated by VEO. Real data has no sim-to-real gap, but generalizability is determined by data coverage, not by whether it's real or virtual data itself. In the near future, traditional physical simulation will gradually be replaced by generative model-based simulation. I believe in scalable data. > **01:03:48 A world model is Vision-Language-Vision, vision and language in, generating the next frame of images.** The definition of a world model is: if you provide the previous frame and the robot's action, you can predict the next frame. From another perspective, VEO is a video generation model, but Genie is more like a world model. When you can have an input at each frame to change your next frame, that feeling is a world model; but if it's an already generated, static video of a few seconds, then it's not. A world model is essentially Vision-Language-Vision, with vision and language as input, it can generate the next frame of images. > **01:08:29 If you have a dexterous hand, haptics become very important. The reason I previously thought haptics were not important was limited by the hardware at the time.** If you have a dexterous hand, haptics become very important. The reason I previously thought haptics were not important was that it was actually limited by the hardware at the time. We are still in the gripper era. For all tasks that can be accomplished by grippers, I still believe vision can solve 95% of the problems. In the future, humanoid robots will not be the only form, but they will definitely be a mainstream form. If your goal is to solve AGI in the physical world, then I will be very focused on what the final form looks like; other things might be distractions. > **01:17:35 A person with a sense of mission will not tolerate saying "I'm on a wrong ship."** Have there been any changes in Google AI or robotics research culture in recent years? Whether it's promotion, performance review, incentive, or various structures, Google wants to create an environment where more people can work together to solve bigger problems. Like Gemini Robotics, it's more top-down. I found that China might not be as competitive as me; I might work 70 to 80 hours a week. Seriously, this era really can't wait, otherwise others will have already done it. A lot of AI is mathematics, and Chinese people are generally better at mathematics. 《106. Talking with Wang He about the Academic Edge History of Embodied Intelligence and the Man-made Chaos after Capital Bombardment》 《109. Robots Encountering a Data Famine? Talking with Xie Chen: Simulation and Synthetic Data, Meta's Sky-High Acquisition, and Alexandr Wang》 [More Information] The text version of this episode has been published. Please search for our studio's official public account: 语言即世界language is world
Original title: 121. 对DeepMind谭捷的访谈:机器人、跨本体、世界模型、Gemini Robotics 1.5和Google
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是<strong>Google DeepMind机器人团队的高级研究科学家兼技术负责人谭捷</strong>,他的研究方向是将基础模型和深度强化学习方法应用于机器人领域。</p><p>中美在机器人领域一直存在两种叙事:市场普遍认为,中国在硬件上发展更快,美国在机器人大脑设计上更领先。</p><p><strong>本期节目中,谭捷将带我们一窥硅谷视角,尤其是Google DeepMind视角下的机器人前沿叙事。</strong></p><p>前不久,他们刚发布了新工作 “Gemini Robotics 1.5 brings AI agents into the physical world”(Gemini Robotics 1.5将AI Agents带入物理世界),我们也聊了聊他们的最新发现。</p><p>由于嘉宾工作环境的原因,会出现一定程度的中英夹杂,还大家多多包容和支持。</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fou2bKSBSkt--i4_WxqqBjg8IpW0.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:00 机器人是在真实世界里做图形学,图形学是在simulation里做机器人</strong></blockquote><p>嘉宾小传:小时候喜欢打游戏,读博士读的计算机图形学</p><p>从图形学转型机器人的变轨</p><p>我在Google的第一篇论文《Sim-to-Real: Learning Agile Locomotion For Quadruped Robots》(从仿真到现实:学习四足机器人敏捷运动),开创了强化学习和seem to real在足式机器人上的应用</p><p>Paradigm Shift,过去十年第一个是强化学习,第二个是大语言模型</p><p>大语言模型对机器人的影响(大语言模型类似大脑,强化学习类似小脑)</p><blockquote><strong>13:06 机器人基座大模型到底是不是一个非常独立的学科?So far, not yet</strong></blockquote><p>今天的机器人发展到什么阶段了?</p><p>从demo到真正落地,隔十年并不是一个非常夸张的事</p><p>从我的角度来说,我不得不承认,最近几年的机器人智能发展主要还是依赖于多模态大模型</p><p>但多模态模型缺什么呢?缺少robot action的输出</p><p>当你真正有一个generalist model(通用模型)的时候,specialized model(专有模型)就完全不能与之竞争</p><blockquote><strong>23:44 Robotics最大问题是数据,它在一个非常复杂的unstructured environment里,可以发生任何事情</strong></blockquote><p>最大的问题还是数据问题</p><p>但是robotics是在一个非常复杂的unstructured environment(非结构化环境)里,可以发生任何事情</p><p>它需要极大量的、非常diverse(多元)的数据,但这些数据现在是不存在的</p><p>现在有很多startup叫data factory(数据工厂)</p><p>所谓“数据金字塔”包括哪些?</p><blockquote><strong>27:52 Gemini Robotics 1.5:我们有一个方法叫motion transfer,这是独门秘诀</strong></blockquote><p>Gemini Robotics 1.5最重要的发现是什么?</p><p>第一个是我们把“thinking”加入了VLA模型</p><p>第二个非常重要的突破是cross-embodiment transfer(跨具身迁移)</p><p>Gemini Robotics 1.5的工作中,我们做了一个快慢模型的划分</p><p>它应该是个过渡的方式,因为现在受制于算力的限制、模型大小的限制</p><p>当你要一个unify model(统一模型)的时候,它必须非常大</p><p>Motion Transfer?It’s very secret</p><blockquote><strong>47:32 生成极大量仿真数据,是弥补它缺点的一个重要手段</strong></blockquote><p>我们比较重视的一点还是数据、数据、数据</p><p>遥操作是非常难以获取的数据</p><p>我们会花更多的精力,比如利用simulation数据,利用human video(人类视频),利用YouTube上的一些数据,甚至利用模型生成的数据,比如VEO生成的一些数据</p><p>真实数据没有sim-to-real gap(仿真到现实差距),但是泛化性是由数据的coverage(覆盖)导致的,并不是因为它本身是真实数据还是虚拟数据</p><p>在不远的将来,传统物理模拟仿真会慢慢地被生成式模型的仿真所取代</p><p>我信仰的是scalable data</p><blockquote><strong>01:03:48 世界模型就是Vision-Language-Vision,vision和language in,生成下一帧的图像</strong></blockquote><p>世界模型的定义是:如果给上前一帧,再给上机器人的动作,你可以预测下一帧</p><p>从另外一个角度,VEO它是一个视频生成模型,但是Genie它更像一个世界模型</p><p>当你在每一帧的时候,可以有一个输入来改变你的下一帧,那个感觉就是世界模型;但是如果它是一个已经生成好的、几秒钟的静态视频,那就不是</p><p>世界模型其实就是Vision-Language-Vision,vision和language in,它可以生成下一帧的图像</p><blockquote><strong>01:08:29 如果你有灵巧手,触觉就非常重要,之所以我前面觉得触觉不重要,是受限于当时的硬件</strong></blockquote><p>如果你有灵巧手,触觉就非常重要</p><p>之所以我前面觉得触觉不重要,是因为它其实受限于当时的硬件</p><p>现在还在夹爪时代</p><p>在所有夹爪能完成的任务里,我还是觉得视觉可能可以解决95%的问题</p><p>在未来,人形机器人不会成为唯一的形态,但一定是个主流的形态</p><p>如果你的目标是solve AGI in the physical world(在物理世界实现AGI),那么我会非常聚焦于最终的形态是什么样子,其他的东西可能都是distraction(干扰)</p><blockquote><strong>01:17:35 一个有使命感的人,他不会容忍说“I’m on a wrong ship”</strong></blockquote><p>这几年Google AI或者robotics的研究文化上有没有发生过变化?</p><p>不管是从promotion、performance review、incentive,还是各种各样的structure上,Google想创造一个环境,使得更多的人可以一起解决更大的事情</p><p>像Gemini Robotics,它更多是自上而下</p><p>我发觉好像国内不一定比我卷,我一周可能工作70到80个小时</p><p>真的,这个时代真的是等不起,不然别人都做出来了</p><p>AI有很多是数学,华人数学比较好</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p><a href="https://www.xiaoyuzhoufm.com/episodes/6857f2174abe6e29cb65d76e">《106. 和王鹤聊,具身智能的学术边缘史和资本轰炸后的人为乱象》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68767e4c93fd2d72b8607c80">《109. 机器人遭遇数据荒?与谢晨聊:仿真与合成数据、Meta天价收购和Alexandr Wang》</a></p><p>【更多信息】</p><p>本集的文字版本已发布,请搜索我们工作室的官方公众号:</p><p>语言即世界language is world</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
120. Xiaopeng's newly appointed Liu Xianming's first interview: Language is poison, remove L, simplicity is beauty, change of leadership, Xiaopeng's AI transformationFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-11-18 09:06
Today's guest is Liu Xianming, head of Xpeng Motors' autonomous driving center. On October 9, 2025, Xpeng Motors suddenly announced that Li Liyun, the former head of the autonomous driving center, would step down and be replaced by Liu Xianming, the head of the world base model. This means that Liu Xianming is the fourth person in charge of Xpeng's autonomous driving after Gu Junli, Wu Xinzhou (now head of Nvidia's autonomous driving China team), and Li Liyun. The outside world is very curious about him. **This is Liu Xianming's first exclusive interview since taking office.** Our interview time was October 30, 2025. In this episode, **we talked about his key technology decisions to dismantle the large model Language after taking office, and the AI strategic transformation of a car company.** > **02:16 Character Sketch** Formerly engaged in machine learning and computer vision research at Meta and Cruise It just so happened that Cruise was second at the time, and the story of joining the second place and counterattacking is always exciting. The beginning and end of joining Xpeng Motors: meeting with He Xiaopeng in the US office for 1 hour in January 2024 The technical stage of autonomous driving that he has personally experienced > **19:00 Large Model Dismantles Language** Our approach is simple and direct, just dismantle the VLA's Language The model is a machine, the fuel is data, and once Language is mixed in, the efficiency becomes extremely low. We simply dismantled all the Language: input V-L joint corpus, directly output Action The process of "dismantling L", "simple is beautiful" Key data issues > **33:53 Xpeng Motors' Transformation to a Physical AI Strategy** Why does the autonomous driving strategy of a car company need to transform to an AI strategy? The beginning of the transformation may be the 10th anniversary of Xpeng Motors last year. Autonomous driving companies are concerned about KPIs and takeover rates, while AI companies focus on underlying technical indicators, even risky long-term indicators. Liu Xianming's short-term and long-term KPIs What does AI mean to Xpeng Motors? "It's a multiplication factor" In addition to dismantling Language this year, lidar, regulation and control rules, and end-to-end have also been dismantled before. Why has the development of artificial intelligence been going through a process of dismantling? World model Planning for L4 next year > **54:30 Behind the Change of Leadership** Things I've done in the past year that were "stubborn" It looks like I have a good personality, but I have also slammed the table and lost my temper. I encountered great resistance in the process of "dismantling L" because it goes against common sense in papers. The counter-consensus of DeepSeek-OCR Current focus on cutting-edge AI directions Responding to the views of Yu Kai, founder of Horizon ("Autonomous driving should be handed over to suppliers") Why is there no generational difference in domestic autonomous driving? AI is an important competition point for car companies in the next stage, and those who don't do well will be eliminated. He Xiaopeng's attention time, methods, and recent 3 topics on AI The gene problem of manufacturing companies and AI companies Next, the challenges for me Xpeng Zhidong's number one position, everyone's historical mission 《70. Chatting with He Xiaopeng, FSD, "Swimming in a Sea of Blood", Heroes and Dogs in Troubled Times》
Original title: 120. 小鹏新上任的刘先明首次访谈:Language是毒药、拆掉L、简单即美、换帅、小鹏的AI转型
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是小鹏汽车自动驾驶中心负责人刘先明。</p><p>就在2025年10月9日,小鹏汽车突然宣布,原自动驾驶中心负责人李力耘将卸任,由世界基座模型负责人刘先明接任。</p><p>这意味着,刘先明成为小鹏在自动驾驶上,既谷俊丽、吴新宙(现英伟达自动驾驶中国团队负责人)、李力耘之后的第四任负责人。外界对他有诸多的好奇。</p><p><strong>这是刘先明上任后首次接受专访。</strong>我们访谈的时间是2025年10月30日。这集节目,<strong>我们聊了聊他上任后拆掉大模型Language等关键技术决策,以及一家车企的AI战略转型。</strong></p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FnviL6xH_VryZ3pil5QmY8VreFNF.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:16 人物小记</strong></blockquote><p>曾在Meta、Cruise,从事机器学习与计算机视觉研究</p><p>恰好Cruise当时是第二名,加入第二名再逆袭的故事永远是令人兴奋的</p><p>加入小鹏汽车始末:2024年1月在美国办公室与何小鹏见面1小时</p><p>所亲历过的自动驾驶的技术stage</p><blockquote><strong>19:00 大模型拆Language</strong></blockquote><p>我们的做法简单直接,把VLA的Language拆掉就完了</p><p>模型是机器,燃料是数据,一旦掺入Language会让效率变得极低</p><p>我们干脆把Language全都拆掉好了:输入V-L联合语料,直接输出Action</p><p>“拆L”的过程、“简单就是美”</p><p>关键的数据问题</p><blockquote><strong>33:53 小鹏汽车向物理AI战略的转型</strong></blockquote><p>为什么一家汽车公司的自动驾驶战略需要向AI战略转型?</p><p>转型的开端可能是去年小鹏汽车10周年</p><p>自动驾驶企业关心的是KPI、接管率,AI企业关注底层的技术指标,甚至risky的长期指标</p><p>刘先明的短期和长期KPI</p><p>AI对于小鹏汽车意味着什么?“是乘法因子”</p><p>除了今年拆Language,之前还拆了激光雷达、规控规则、端到端</p><p>人工智能发展为什么一直在经历着拆拆拆的过程?</p><p>世界模型</p><p>明年对L4的规划</p><blockquote><strong>54:30 换帅的背后</strong></blockquote><p>过去1年做过“头铁”的事情</p><p>看起来我性格很好,我也拍过桌子、发过火</p><p>“拆L”过程中遇到很大阻力,因为这很反paper里的常识</p><p>DeepSeek-OCR的反共识</p><p>现在关注的AI前沿方向</p><p>回应地平线创始人余凯的观点(“自动驾驶应该交给供应商”)</p><p>为什么国内自动驾驶还没有代际差?</p><p>AI是车企下一阶段的重要赛点,做不好会被淘汰</p><p>何小鹏对于AI的关注时间、方式和最近的3次话题</p><p>制造企业和AI企业的基因问题</p><p>接下来,对于我的挑战</p><p>小鹏智驾一号位,每个人的历史使命</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p><a href="https://www.xiaoyuzhoufm.com/episodes/6695032837236c546e4c2e0f">《70. 和何小鹏聊,FSD、“在血海游泳”、乱世中的英雄与狗熊》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
Kimi Linear、Minimax M2、杨松琳考古算法变种史,以及未来架构改进方案预演。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-11-03 23:00
This episode will discuss a crucial topic: **AI algorithm and architecture innovation.** Our guest is MIT PhD student Yang Songlin, specializing in linear attention mechanisms. We'll delve into the newly released models Kimi Linear, Minimax M2, and Qwen3-Next. Songlin contributed to Kimi Linear and Qwen3-Next and **is a co-author of the Kimi Linear paper.** Why is algorithm innovation particularly important in 2025? Because data, computing power, and algorithms drive AI. With data limitations, model companies are "sculpting model architectures" hoping Scaling Law continues. China's limited computing power **has pushed Chinese AI algorithm innovation to the forefront.** You'll hear that **DeepSeek's MoE (Mixture of Experts) is the biggest architectural breakthrough, making MoE a global consensus; the next breakthrough may be Attention.** Chinese companies are betting on different Attention techniques: * DeepSeek is exploring Sparse Attention. * Kimi is exploring Linear Attention. * Minimax explored Linear Attention in its early M1 version but reverted to Full Attention in the released M2 version. Songlin will discuss her work on **《Kimi Linear: An Expressive, Efficient Attention Architecture》** and analyze these companies' Attention choices; **She'll also lead us through AI algorithm variations and predict future algorithm and architecture improvements.** > *This episode is technical and may be challenging. Listen according to your needs. The guest's work environment uses both Chinese and English.* **04:00** Personal background, research focus, and exploration of linear attention. **06:27** Songlin created an open-source library: flash-linear-attention (FLA). **07:04** Understanding Linear Attention's "Linear" in simple terms. **11:19** Discussing recent work, the newly released 《Kimi Linear: An Expressive, Efficient Attention Architecture》. (Invited by Zhang, Yu, another FLA author) **12:20** Why did Kimi need to redesign the attention mechanism at the beginning of the year? The background and goals of the design. Linear Attention significantly reduces computation and memory costs during inference. Full Attention is very expensive for long text decoding. **14:39** **Key explanation of the 《Kimi Linear》 paper: KDA module** (Kimi Delta Attention, incremental attention mechanism). **18:56** Kimi has a Scaling Ladder; good performance at one scale leads to scaling at the next. **20:20 Kimi Linear Attention vs DeepSeek Sparse Attention:** Kimi uses linear attention, DeepSeek uses sparse attention, both aiming to solve long text decoding efficiency. **23:01 Minimax's architectural changes from M1 to M2, reverting from Linear Attention to Full Attention:** Why? **27:00** Cannot fully discuss Silicon Valley's attention mechanisms, but can briefly discuss OpenAI's published solutions. **28:05** The progression of Linear Attention since its invention in 2020. Linear Attention is considered when people hit the Context Wall. Recent renewed interest in long text decoding prompts a re-evaluation of this technology. **38:16** Pure Linear Attention is ineffective. Hybrid attention mechanisms still have global attention layers to ensure a baseline. **40:30 Kimi Linear inserts one full attention layer every three KDA layers, a three-to-one ratio becoming a consensus.** Minimax previously used a seven-to-one ratio, but now they're gradually returning to three-to-one - a consensus within the non-consensus of hybrid attention mechanisms. **42:32** Trade-off between expressivity and efficiency. **Minimax also mentioned that hybrid linear attention/hybrid sliding window attention has defects in "multi-hop inference."** The GAP may narrow if we develop hardware-efficient RNNs (Recurrent Neural Networks) with better expressiveness for "multi-hop inference." **46:28** chunkwise algorithm for parallelization. **47:55** How to design Attention? Two mainstream routes and some non-mainstream routes. **49:36** **Future ideal solution combining Linear Attention and Sparse Attention.** Linear Attention and Sparse Attention aren't competitive. Linear Attention's competitor might be Sliding-Window Attention. Industry exploration of combining Linear Attention and Sparse Attention seems to have not yet started. **My ideal solution: replace the global attention (Full Attention) in hybrid attention with sparse attention (Sparse Attention).** Sparse Attention can completely replace Full Attention if chosen accurately, but the problem now is it cannot be chosen accurately. **55:36** Fair comparison: Linear Attention vs Sliding-Window Attention. **57:05** Transformer → MoE → Linear/Sparse Attention algorithm evolution, driven by the goal of achieving lower loss functions with the same FLOPs (floating-point operations per second). MoE (Mixture of Experts) is a more efficient replacement for FNN (Feedforward Neural Network). **58:26 The biggest architectural breakthrough in recent years is MoE, the next breakthrough may be Attention; Transformer has two modules, FFN and Attention; FFN has been sculpted into MoE, now Attention can also be sculpted.** **01:01:28** Data, algorithms, and computing power drive AI. When data is limited, algorithm innovation becomes more important. **01:02:48** Future of architecture: 1. Can we eliminate global attention? It's the main bottleneck preventing context window scale-up. 2. Continue Learning, allowing AI to learn itself. **01:04:30** How to continue scaling up Linear Attention Transformers? **01:07:43** Chinese AI algorithm innovation is stronger than overseas because there are fewer cards (computing resources). US companies invest more in optimizers; China is gradually paying attention. **01:10:56** Other training details: NoPE vs. RoPE. **01:12:09** DeepSeek-OCR. **01:12:55** Songlin also participated in Qwen3-Next, but not Minimax M2. **01:13:39** The people who "sculpt" architectures. **01:15:16** Personal journey: "When you know exactly what you want to do, you won't encounter any setbacks." Experience sharing: PhD is going smoothly, thanks to my archaeological research in the six months before starting. **01:23:12 Speaking of archaeology, let's talk about the history of algorithm variations starting with Transformer.** **01:29:50** Delta Rule algorithm, hardware affinity, DeepSeek highly pursues hardware and algorithm matching. **01:42:23** Advice for younger people. Previous episodes with the guest: 《In-depth Explanation of DeepSeek, Kimi, MiniMax's New Attention Mechanism Papers – "Violent Aesthetics on Hardware"》 Papers mentioned: 《Kimi Linear: An Expressive, Efficient Attention Architecture》 《MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention》 《DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models》
Original title: 119. Kimi Linear、Minimax M2?和杨松琳考古算法变种史,并预演未来架构改进方案
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天这集节目,我们将讨论一个在当下非常关键的话题:<strong>人工智能的算法与架构创新。</strong></p><p>嘉宾是我们的往期嘉宾返场,她是MIT在读博士杨松琳,研究方向是线性注意力机制。</p><p>我们将从最新发布的几个模型Kimi Linear、Minimax M2、Qwen3-Next切入。松琳参与讨论Kimi Linear和Qwen3-Next的部分工作,<strong>是Kimi Linear论文的作者之一。</strong></p><p>算法创新为什么在2025年变得尤为重要?</p><p>它的背后原因是,数据、算力和算法是驱动人工智能的三驾火车,在数据撞墙的无奈前提下,各个模型公司不得不重新开始“雕模型架构”,以期Scaling Law的魔法继续。而由于中国的算力相对美国有限,<strong>这反而让中国的AI算法创新走在了世界前沿。</strong></p><p>这集节目你将听到,<strong>近几年架构最大突破是DeepSeek的MoE(混合专家模型),它让MoE成为了全球共识;而下一个突破的重要方向可能就是Attention(注意力机制)。</strong></p><p>中国公司在Attention展开了不同技术bet(押注):</p><ul><li><p>截至目前已发布模型,DeepSeek正在探索Sparse Attention(稀疏注意力机制);</p></li><li><p>Kimi正在探索Linear Attention(线性注意力机制);</p></li><li><p>Minimax在年初的M1版本中探索Linear Attention,而在刚发布的M2版本中又回退到 Full Attention(全局注意力机制)。</p></li></ul><p>节目中,松琳将讲解她参与的这篇<strong>《Kimi Linear: An Expressive, Efficient Attention Architecture》</strong>的工作,并分析以上这些公司在Attention上的不同抉择;</p><p><strong>与此同时,她也将带领大家考古人工智能算法变种史,并预演未来算法与架构的改进方案。</strong></p><blockquote><p><em>本集比较硬核,会有一些专业难度,大家可以根据自己的实际需要收听嗷:)因为嘉宾的工作环境会出现中英夹杂,希望大家多多理解和支持。</em></p></blockquote><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FmecfeaBt1PLqDUxyYlRi5y4hxW6.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p><strong>04:00</strong> 个人、研究主线与线性注意力机制的探索之路<br /><strong>06:27</strong> 松琳做过一个开源库:flash-linear-attention(简称FLA)<br /><strong>07:04</strong> 怎么通俗理解Linear Attention的Linear?<br /><strong>11:19</strong> 聊聊最近参与的新工作,前几天刚发布的《Kimi Linear: An Expressive, Efficient Attention Architecture》(Kimi Linear:一种具有强表达能力与高效率的注意力架构)<br />(FLA库的另一个作者Zhang, Yu邀请)<br /><strong>12:20</strong> 为什么Kimi在年初开始需要重新设计注意力机制?设计的背景和目标<br />在Linear Attention下,推理阶段的计算与显存成本都显著降低;而使用Full Attention时,长文本解码的代价会非常高昂<br /><strong>14:39</strong> <strong>《Kimi Linear》论文重点讲解:KDA模块</strong>(Kimi Delta Attention,增量注意力机制)<br /><strong>18:56</strong> Kimi内部有一个Scaling Ladder(规模阶梯),在一个规模下面表现好就在下一个规模下面去scale,就像通关<br /><strong>20:20 Kimi Linear Attention vs DeepSeek Sparse Attention:</strong>Kimi走线性注意力路线,DeepSeek走稀疏注意力路线,都想解决长文本decoding(长上下文生成)的效率问题<br /><strong>23:01</strong> <strong>Minimax从M1到M2的架构变化,从Linear Attention退回到Full Attention</strong>,为什么?<br /><strong>27:00</strong> 硅谷的注意力机制方案不方便说,但可以浅聊一下OpenAI有paper的方案<br /><strong>28:05</strong> Linear Attention从2020年发明出来开始后的前进线索<br />每一次大家关心Linear Attention都是因为大家撞到了Context Wall<br />最近长文本的decoding卷土重来,让人们不由自主审视这一套技术<br /><strong>38:16</strong> 纯Linear Attention是无效的,混合注意力机制还是有很多全局注意力层,这样下限有保证<br /><strong>40:30</strong> <strong>Kimi Linear每3层KDA插入1层全注意力层,三比一的比例快变成共识了</strong><br />Minimax之前用的是七比一,但现在大家逐渐回到三比一——这成为不共识的混合注意力机制中的共识了<br /><strong>42:32</strong> 权衡(Trade-off)表达能力(expressivity)与计算效率(efficiency)<br /><strong>Minimax曾经也提到,混合线性注意力/混合滑窗注意力在“多跳推理”上会有缺陷</strong><br />对于“多跳推理”,如果我们开发一些硬件高效但表达能力更好的RNN(循环神经网络),这个GAP有可能缩小<br /><strong>46:28</strong> chunkwise algorithm for parallelization(分块并行算法)<br /><strong>47:55</strong> 如何设计Attention?两条主流和一些非主流路线<br /><strong>49:36</strong> <strong>结合Linear Attention和Sparse Attention的未来理想方案</strong><br />Linear Attention和Sparse Attention没什么竞争关系,Linear Attention的竞争对手可能是Sliding-Window Attention(滑窗注意力)<br />工业界Linear Attention和Sparse Attention结合的探索似乎还没开始<br /><strong>我想象中的理想方案是:把混合注意力的全局注意力(Full Attention)换成稀疏注意力(Sparse Attention)</strong><br />只要Sparse Attention选得准,完全可以取代Full Attention,但现在的问题是它选不准<br /><strong>55:36</strong> 公平的比较:Linear Attention vs Sliding-Window Attention(滑窗注意力)<br /><strong>57:05</strong> Transformer → MoE → Linear/Sparse Attention的算法演变,背后动因是给定你相同的FLOPs(浮点运算量),利用这些FLOPs,取得更低的损失函数<br />MoE(混合专家)是更高效的FNN(前馈神经网络)的替代品<br /><strong>58:26</strong> <strong>近几年架构方面突破最大的是MoE,下一个突破可能是Attention;Transformer就两个模块,一个是FFN,一个是Attention;现在FFN已经雕成MoE,现在Attention大家也可以雕一下</strong><br /><strong>01:01:28</strong> 数据、算法、算力是驱动人工智能的三驾马车,当数据遇到数据强,算法创新变得更重要<br /><strong>01:02:48</strong> 架构的未来:1、能不能干掉全局注意力?它是阻止context window继续scale up的主要瓶颈<br />2、Continue Learning,让AI自己学习<br /><strong>01:04:30</strong> 如何把Linear Attention的Transformer继续scale up?<br /><strong>01:07:43</strong> 中国AI的算法创新相比海外肯定是更强的——因为没有那么多卡(<br />不过美国公司更多投入优化器一点,国内在逐步重视<br /><strong>01:10:56</strong> 其他训练细节:NoPE vs. RoPE<br /><strong>01:12:09</strong> DeepSeek-OCR<br /><strong>01:12:55</strong> 松琳也参与了Qwen3-Next,没有参与Minimax M2<br /><strong>01:13:39</strong> “雕”架构的人<br /><strong>01:15:16</strong> 自己的心路:“当你很清楚你要做什么的时候,你是不会遇到什么挫折的”<br />经验分享:PhD还挺顺利的,得益于我入学之前的半年考古<br /><strong>01:23:12</strong> <strong>说到考古,我们在最后聊聊从Transformer开始的算法变种历史</strong><br /><strong>01:29:50</strong> Delta Rule算法、硬件亲和、DeepSeek非常追求硬件和算法的匹配<br /><strong>01:42:23</strong> 给更年轻的年轻人的建议</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>嘉宾往期节目:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67bb3696606e5c5940533ef4" rel="noopener noreferrer nofollow" target="_blank">《逐篇讲解DeepSeek、Kimi、MiniMax注意力机制新论文——“硬件上的暴力美学”》</a></p><p>谈到的论文:</p><p><a href="https://arxiv.org/pdf/2510.26692" rel="noopener noreferrer nofollow" target="_blank">《Kimi Linear: An Expressive, Efficient Attention Architecture》</a></p><h1><a href="https://arxiv.org/abs/2506.13585?utm_source=chatgpt.com" rel="noopener noreferrer nofollow" target="_blank">《MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention》</a></h1><p><a href="https://arxiv.org/abs/2401.06066?utm_source=chatgpt.com" rel="noopener noreferrer nofollow" target="_blank">《DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
李想第二次3小时访谈:CEO大模型、MoE、梁文锋、VLA、能量、记忆、对抗人性、亲密关系、人类的智慧From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-10-30 02:30
April 2025, I recorded AI Talk Season 2 with Li Xiang, founder and CEO of Li Auto. The conversation lasted a long time, but the broadcast version was only 1 hour. Today you are seeing the full version. This episode was released later than expected. I've been too busy the past few months, and I hesitated to release it. But when I reorganized this content, I was still touched by it - **This is a "node-style thinking archive" about the technological revolution of artificial intelligence.** You can watch it in conjunction with our 3-hour conversation at the end of 2024 to feel the extension and echo of thinking between the two dialogues. **This time, I asked Li Xiang questions as a "CEO large model."** Assuming he is a MoE (Mixture of Experts) architecture model, I called his three "experts" in the first three rounds of the conversation: technology expert, strategy expert, and organization expert. And as the conversation deepened into the second half, we began to discuss people, energy, intimate relationships, memory programs, and human wisdom. **"The relationship between AI and humans" is the theme of this dialogue.** (Recorded in April 2025) > **02:35 Chapter 1: Suppose You Are a CEO Large Model** Humans do entropy reduction, AI does entropy increase Three levels of tools: "information tools", "auxiliary tools", "production tools" The important measure of a "production tool" is: you are willing to pay for it Liang Wenfeng minimally used the best human practices Following best practices is anti-human, doing whatever you want satisfies human nature I can only be the best version of myself, I have always been on the extension line of my strengths Why does Li Auto still make a base large model? At that time, we were more worried about what Chen Wei's team (the base model self-developed team) thought? This pressure is quite high > **36:18 Chapter 2: Calling MoE's Technology Expert** Li Xiang teaches you how to train VLA Reaching VLA is not a sudden change, but an evolutionary process, which has gone through three stages Let me tell you how VLA is trained and how VLA works by itself I don't do super long CoT, my CoT chain is generally two to three steps There won't be a general Agent for at least 5 years, but there will be an Agent OS You have to speak according to human nature and act against human nature If everyone doesn't want to accumulate the buns in front and just wants to eat the 10th bun, it's very similar to practicing the "Sunflower Manual" Black box, world model and pricing logic The verification cost of every 10,000 kilometers, we have reduced from 180,000 at the beginning to 4,000 yuan > **01:25:36 Chapter 3: Calling MoE's Strategic Expert** 2025 Yanqi Lake Strategic Meeting If you look at the strategy, the circle in the middle is the scale, and there are three variables outside the circle: user needs, technology products, and organizational capabilities Those with these four characteristics are the terminals of the AGI era: the ability to perceive the physical world 360 degrees, the ability to make cognitive decisions, the ability to take action, and the ability to reflect and provide feedback In the AGI era, the requirements for capabilities have become different If we look at 2030, we hope to become a globally leading artificial intelligence terminal enterprise This is the question we need to solve in the next 3-6 years Is Li Xiang's ideal too ideal? Build a 3-7 person energy body High-dimensional organizations are compatible with low-dimensional organizations > **02:09:26 Chapter 4: Wisdom is the Relationship Between Us and All Things** My memory program Starting a business is not easy, but there is no need to be miserable Eldest daughter We have achieved a "three-person support" in our family, which has greatly improved the energy of the family People are used to play, people are not used to change Don't build so many intimate relationships, too many intimate relationships prove that this person doesn't know how to manage relationships Develop wisdom as an important human trait My first 3-hour interview with Li Xiang: "3-hour interview with Li Xiang (Podcast version): Otaku, AI, family, games and ladder" This episode is also available in text and video versions: Article: Public account (language is world) Video: Bilibili (Zhang Xiaojun's business interview record)
Original title: 118. 对李想的第二次3小时访谈:CEO大模型、MoE、梁文锋、VLA、能量、记忆、对抗人性、亲密关系、人类的智慧
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>2025年4月,我与理想创始人兼CEO李想录制AI Talk第二季。那次对谈持续了很长时间,播出版仅1小时,今天你看到的是完整版。</p><p>这一集节目的发布比预期晚了些。过去几个月实在太忙了,我一度犹豫要不要继续放出。但当我重新整理这些内容时,仍然被它打动——<strong>这是一份关于人工智能技术变革的“节点式思考存档”。</strong></p><p>你可以结合2024年底我们那场3小时谈话一起观看,感受两次对话之间,思考的延展与呼应。</p><p><strong>这次,我把李想当作一个“CEO大模型”来提问。</strong></p><p>假设他是一种MoE(Mixture of Experts,专家混合)架构的模型,我在对话的前三个回合调用了他的三位“专家”:技术专家、战略专家、组织专家。而当谈话深入到后半程,我们开始讨论人、能量、亲密关系、记忆程序与人类的智慧。</p><p><strong>“AI与人的关系”,是本次对话的母题。</strong></p><p>(录制于2025年4月)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FstqHmZUbyhG6hRts2lcBWeDn0fw.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:35 第一章:假若你是一个CEO大模型</strong></blockquote><p>人类做熵减,AI做熵增</p><p>工具的三个分级:“信息工具”、“辅助工具”、“生产工具”</p><p>“生产工具”重要的衡量是:你愿意为它付钱</p><p>梁文锋极简运用了人类最佳实践</p><p>按照最佳实践是反人性的,随心所欲才满足人性</p><p>我只能做最好的自己,我一直在自己的长板延长线上</p><p>理想为什么还做基座大模型?</p><p>当时我们比较担心陈伟团队(基座模型自研团队)怎么想?这个压力挺大的</p><blockquote><strong>36:18 第二章:调用MoE之技术专家</strong></blockquote><p>李想手把手教你训VLA</p><p>达到VLA不是突变的过程,是进化的过程,经历了三个阶段</p><p>我给你讲一下VLA是怎么训的,以及VLA自己怎么去工作的</p><p>我不会做超长CoT,我的CoT链条一般两步到三步</p><p>至少5年内不会有通用Agent,但会有一个Agent OS</p><p>要顺着人性去说,逆着人性去做</p><p>如果大家不想做前面包子的积累,只想吃第10个包子,很像练《葵花宝典》</p><p>黑盒、世界模型和定价逻辑</p><p>每1万公里的验证成本,我们做到从最开始18万降到4000块钱</p><blockquote><strong>01:25:36 第三章:调用MoE之战略专家</strong></blockquote><p>2025年雁栖湖战略会</p><p>如果看战略,中间的圈是规模,圈外边有三个变量:用户需求、技术产品、组织能力</p><p>具备这四个特点的,就是AGI时代的终端:360度对物理世界感知的能力、认知决策的能力、Action的能力、反思反馈能力</p><p>到了AGI时代的终端,对于能力的要求变得不一样了</p><p>如果看到2030年,我们希望能够成为全球领先的人工智能终端企业</p><p>这是我们未来的3-6年要去解的题</p><p>李想的理想会不会太过于理想?</p><p>构建3-7人能量体</p><p>高维组织兼容低维组织</p><blockquote><strong>02:09:26 第四章:智慧是我们和万物的关系</strong></blockquote><p>我的记忆程序</p><p>创业不容易,但是没必要苦哈哈的</p><p>大女儿</p><p>我们家里实现了一个“三人支撑”,这让家里的能量大幅地提升</p><p>人是用来发挥的,人不是用来改变的</p><p>不要构建那么多亲密关系,亲密关系太多了就证明这个人不会经营关系</p><p>把智慧当成一个重要的人类特质去发展</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>对李想的第一次3小时访谈:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67769bd815a5fd520e8fa318">《对李想的3小时访谈(播客版):宅男、AI、家庭、游戏和天梯》</a></p><p>本集节目同步上线文字版和视频版:</p><p>文章:公众号(语言即世界language is world)</p><p>视频:Bilibili(张小珺商业访谈录)</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
开源论文探索之旅:模型范式、基础设施与数据、语言、多模态的完整演变史From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-10-28 02:33
今天的嘉宾是谢青池,他是美团光年之外的产品负责人。 一个月前,青池找到我,说他用了一年多的时间一篇一篇地啃完了200多篇AI论文,从开始全然不得要领,到后来逐渐地入门——**而他希望将他的论文探索之旅开源给大家。** 就这样,我们有了今天这集特别的节目。 他从200多篇论文中精选了36篇经典,4小时讲解,带你穿越AI变迁史。 他说,**读论文是“给你打开一扇门”,让你能直接“与这个世界最聪明的头脑对话”。** 2025年,期待我们和AI共同进步! 01:30 探索的缘起 07:25 怎么读论文?(用AI学AI) 10:20 辅助小工具和路书 **论文讲解的主干:** > **19:35 Part 1:模型的范式变迁** 故事要从1999年的第一颗GPU开始讲起 Brook: 用GPU进行计算 (2004.08) AlexNet: 深度学习的开端(2012.10) 对序列建模:seq2seq和Attention的引入(2014.09) 蒸馏:模型能被学习吗?(2015.03) ResNet: 比深更深(2015.12) Transformer来了!拉开一个时代的序幕(2017.06) AlphaGo Zero: 强化学习的突破(2017.10) 现代MoE的开端(2017.01) CoT: Prompt Engineering的奠基之作(2022.01) LoRA: 那个我们每天都在用的东西(2021.06) ReAct: Agent从理论到落地(2022.10) The Bitter Lesson: 过去70年的教训(2018.08) > **01:52:58 Part 2:Infra与数据的变迁** ZeRO: 大规模的GPU并行计算(2019.10) Scaling Law & Chinchilla: 上帝的指挥棒(2020.01 2022.03) LAION-5B: 开源社区的英雄主义(2022.10) The RefinedWeb: 互联网的数据也很够用(2023.06) MegaScale: 万卡GPU集群的训练(2024.02) > **02:21:29 Part 3:语言模型的发展** Word2Vec: 用机器学习将单词向量化(2013.01) Google Translate: 神经网络的大规模线上部署(2016.09) GPT-1,它来了(2018.06) BERT: 曾经的王(2018.10) GPT-2: 是时候告别微调了(2019.02) GPT-3: ChatGPT来临前夜(2020.05) InstructGPT: 给LLM以文明(2022.03) Tulu 3: 后训练的开源(2024.11) > **03:08:08 Part 4:多模态模型的发展** DeepVideo: 深度学习进入视频领域,Andrej 初出茅庐(2014.06) 双流网络: Karén和学术重镇牛津登场(2014.06) 图像生成的序章: GAN来了(2014.06) Diffusion: 在GAN的阴影下,悄然成长(2015.03) DDPM: Diffusion重回图像舞台的中央(2020.06) ViT: 当图像遇到Transformer(2020.10) CLIP: 文生图的奠基石(2021.03) Stable Diffusion,它来了(2021.12) DiT: 人们期待一个融合的未来(2022.12) > **03:56:38 最后的聊天** 架构抱住了硬件的大腿 今天技术的边界到达了哪? 给“站在AI世界门外张望的人”和“已经在体系中工作多年的人”的建议 【技术之美】系列: [逐句讲解DeepSeek-R1、Kimi K1.5、OpenAI o1技术报告——“最优美的算法最干净”](https://www.xiaoyuzhoufm.com/episodes/67a1b697247d51713c868367) [逐篇讲解DeepSeek关键9篇论文及创新点——“勇敢者的游戏”](https://www.xiaoyuzhoufm.com/episodes/67aacd6b247d51713cedbeda) [逐篇讲解DeepSeek、Kimi、MiniMax注意力机制新论文——“硬件上的暴力美学”](https://www.xiaoyuzhoufm.com/episodes/67bb3696606e5c5940533ef4) [逐篇讲解机器人基座模型和VLA经典论文——“人就是最智能的VLA”](https://www.xiaoyuzhoufm.com/episodes/67f28c6e0decaeb0943fb14a) [逐段讲解Kimi K2报告并对照ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”](https://www.xiaoyuzhoufm.com/episodes/6889da698e06fe8de77116a9) 【更多信息】 本集的投屏视频版已经同步发布于Bilibili(张小珺商业访谈录):https://www.bilibili.com/video/BV1pkyqBxEdB/?spm_id_from=333.1365.list.card_archive.click&vd_source=aa7c66a3d015be4b5bfcd520784f2790 50页完整PPT开源地址(所有论文链接附在PPT上):https://w7py8ou4dk.feishu.cn/wiki/KacewdlmSiSGC9kUOKDch9gwnKf?from=from_copylink
Original title: 117. 开源一段论文探索之旅:模型范式、Infra和数据、语言、多模态的完整变迁史
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是谢青池,他是美团光年之外的产品负责人。</p><p>一个月前,青池找到我,说他用了一年多的时间一篇一篇地啃完了200多篇AI论文,从开始全然不得要领,到后来逐渐地入门——<strong>而他希望将他的论文探索之旅开源给大家。</strong></p><p>就这样,我们有了今天这集特别的节目。</p><p>他从200多篇论文中精选了36篇经典,4小时讲解,带你穿越AI变迁史。</p><p>他说,<strong>读论文是“给你打开一扇门”,让你能直接“与这个世界最聪明的头脑对话”。</strong></p><p>2025年,期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FlXrk-ijMq1TpE6D2peK8gcRwPwy.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>01:30 探索的缘起</p><p>07:25 怎么读论文?(用AI学AI)</p><p>10:20 辅助小工具和路书</p><p><strong>论文讲解的主干:</strong></p><figure><img src="https://image.xyzcdn.net/FlD9g0se44VT3UAGU8lzaGGev8Ik.png" /></figure><blockquote><p><strong>19:35 Part 1:模型的范式变迁</strong></p></blockquote><p>故事要从1999年的第一颗GPU开始讲起</p><p>Brook: 用GPU进行计算 (2004.08)</p><p>AlexNet: 深度学习的开端(2012.10)</p><p>对序列建模:seq2seq和Attention的引入(2014.09)</p><p>蒸馏:模型能被学习吗?(2015.03)</p><p>ResNet: 比深更深(2015.12)</p><p>Transformer来了!拉开一个时代的序幕(2017.06)</p><p>AlphaGo Zero: 强化学习的突破(2017.10)</p><p>现代MoE的开端(2017.01)</p><p>CoT: Prompt Engineering的奠基之作(2022.01)</p><p>LoRA: 那个我们每天都在用的东西(2021.06)</p><p>ReAct: Agent从理论到落地(2022.10)</p><p>The Bitter Lesson: 过去70年的教训(2018.08)</p><blockquote><p><strong>01:52:58 Part 2:Infra与数据的变迁</strong></p></blockquote><p>ZeRO: 大规模的GPU并行计算(2019.10)</p><p>Scaling Law & Chinchilla: 上帝的指挥棒(2020.01 2022.03)</p><p>LAION-5B: 开源社区的英雄主义(2022.10)</p><p>The RefinedWeb: 互联网的数据也很够用(2023.06)</p><p>MegaScale: 万卡GPU集群的训练(2024.02)</p><blockquote><p><strong>02:21:29 Part 3:语言模型的发展</strong></p></blockquote><p>Word2Vec: 用机器学习将单词向量化(2013.01)</p><p>Google Translate: 神经网络的大规模线上部署(2016.09)</p><p>GPT-1,它来了(2018.06)</p><p>BERT: 曾经的王(2018.10)</p><p>GPT-2: 是时候告别微调了(2019.02)</p><p>GPT-3: ChatGPT来临前夜(2020.05)</p><p>InstructGPT: 给LLM以文明(2022.03)</p><p>Tulu 3: 后训练的开源(2024.11)</p><blockquote><p><strong>03:08:08 Part 4:多模态模型的发展</strong></p></blockquote><p>DeepVideo: 深度学习进入视频领域,Andrej 初出茅庐(2014.06)</p><p>双流网络: Karén和学术重镇牛津登场(2014.06)</p><p>图像生成的序章: GAN来了(2014.06)</p><p>Diffusion: 在GAN的阴影下,悄然成长(2015.03)</p><p>DDPM: Diffusion重回图像舞台的中央(2020.06)</p><p>ViT: 当图像遇到Transformer(2020.10)</p><p>CLIP: 文生图的奠基石(2021.03)</p><p>Stable Diffusion,它来了(2021.12)</p><p>DiT: 人们期待一个融合的未来(2022.12)</p><blockquote><p><strong>03:56:38 最后的聊天</strong></p></blockquote><p>架构抱住了硬件的大腿</p><p>今天技术的边界到达了哪?</p><p>给“站在AI世界门外张望的人”和“已经在体系中工作多年的人”的建议</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【技术之美】系列:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67a1b697247d51713c868367" rel="noopener noreferrer nofollow" target="_blank">逐句讲解DeepSeek-R1、Kimi K1.5、OpenAI o1技术报告——“最优美的算法最干净”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67aacd6b247d51713cedbeda" rel="noopener noreferrer nofollow" target="_blank">逐篇讲解DeepSeek关键9篇论文及创新点——“勇敢者的游戏”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67bb3696606e5c5940533ef4" rel="noopener noreferrer nofollow" target="_blank">逐篇讲解DeepSeek、Kimi、MiniMax注意力机制新论文——“硬件上的暴力美学”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67f28c6e0decaeb0943fb14a" rel="noopener noreferrer nofollow" target="_blank">逐篇讲解机器人基座模型和VLA经典论文——“人就是最智能的VLA”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/6889da698e06fe8de77116a9" rel="noopener noreferrer nofollow" target="_blank">逐段讲解Kimi K2报告并对照ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”</a></p><p>【更多信息】</p><p>本集的投屏视频版已经同步发布于Bilibili(张小珺商业访谈录):https://www.bilibili.com/video/BV1pkyqBxEdB/?spm_id_from=333.1365.list.card_archive.click&vd_source=aa7c66a3d015be4b5bfcd520784f2790</p><p>50页完整PPT开源地址(所有论文链接附在PPT上):https://w7py8ou4dk.feishu.cn/wiki/KacewdlmSiSGC9kUOKDch9gwnKf?from=from_copylink</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
吴明辉19年口述史:漫长沉浮、痛苦急转、企业级Agentic模型、现实世界数值游戏、首次公开募股From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-10-09 05:55
今天的嘉宾吴明辉是明略科技的创始人、CEO兼CTO,明略科技于2025年8月29日获境外发行上市备案通知书,不久后将于香港上市。这是一次上市前的访谈,吴明辉口述一家To B公司漫长的19年故事,其间经历了好多次的分分合合、沉浮与急转。你能在这里面找到许多我们节目嘉宾的身影——肖弘、李广密、杨植麟。我们也聊了聊面向全新的AI时代,企业服务级AI与Agentic Model的前景。但这个故事的最开始,要从他与峰瑞资本创始合伙人李丰的公司合并聊起。2025年,我们和AI共同进步!02:11 Part 1:第一段创业 开始的快问快答 和我们嘉宾广密、Red的渊源 创业的开始:祝伟投资吴明辉和李丰合并后的公司 最开始罗永浩、李笑来是我们的股东 第一版商业计划书就是推荐系统,为什么没做今日头条?奥林匹克竞技训练的心理调适 秒针系统的成功 眼睁睁看着今日头条的流量哗啦啦起来 56:08 Part 2:第二段创业 “老板上完商学院,团队遭殃” 同时创立明略科技、云迹机器人 学习美国一家数据分析公司Palantir,但从To G转向To B 收购Red的决策,我希望他做我的CEO successor 2020-2021年:战场开得太宽、走过的弯路 2022年:痛苦的急转,人生最suffer的一年 有AI以后,预计企业级服务会出现并购潮 01:45:01 Part 3:企业服务级AI 基于公开数据训基础模型、以卖Token为商业模式的公司会很卷,卷成电费 有私有Data的公司能产生差异化价值 现实世界的数值游戏 新产品“DeepMiner”的由来 Agent或Tool Use在企业服务领域产生了新的链接 Agent是一种交互技术,对To C和To B互联网都会产生革命性变化 那些不提供供给侧能力、只提供链接网络,而这个网络又不是根结点的公司,会很危险 将来企业只有两类人?老板和合伙人(合伙人不是公司员工) 一个幸福的老板,个人使命、家庭使命和公司使命高度相关
Original title: 116. 吴明辉口述19年史:漫长的沉浮、痛苦急转、企业级Agentic Model、现实世界的数值游戏、IPO
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾吴明辉是明略科技的创始人、CEO兼CTO,明略科技于2025年8月29日获境外发行上市备案通知书,不久后将于香港上市。</p><p><strong>这是一次上市前的访谈,吴明辉口述一家To B公司漫长的19年故事,其间经历了好多次的分分合合、沉浮与急转。</strong>你能在这里面找到许多我们节目嘉宾的身影——肖弘、李广密、杨植麟。</p><p><strong>我们也聊了聊面向全新的AI时代,企业服务级AI与Agentic Model的前景。</strong></p><p>但这个故事的最开始,要从他与峰瑞资本创始合伙人李丰的公司合并聊起。</p><p>2025年,我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FuJEeo_DXMeLhFt9a035wSLARI49.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p>02:11 <strong>Part 1:第一段创业</strong></p></blockquote><p>开始的快问快答</p><p>和我们嘉宾广密、Red的渊源</p><p>创业的开始:祝伟投资吴明辉和李丰合并后的公司</p><p>最开始罗永浩、李笑来是我们的股东</p><p>第一版商业计划书就是推荐系统,为什么没做今日头条?</p><p>奥林匹克竞技训练的心理调适</p><p>秒针系统的成功</p><p>眼睁睁看着今日头条的流量哗啦啦起来</p><blockquote><p>56:08<strong> Part 2:第二段创业</strong></p></blockquote><p>“老板上完商学院,团队遭殃”</p><p>同时创立明略科技、云迹机器人</p><p>学习美国一家数据分析公司Palantir,但从To G转向To B</p><p>收购Red的决策,我希望他做我的CEO successor</p><p>2020-2021年:战场开得太宽、走过的弯路</p><p>2022年:痛苦的急转,人生最suffer的一年</p><p>有AI以后,预计企业级服务会出现并购潮</p><blockquote><p>01:45:01<strong> Part 3:企业服务级AI</strong></p></blockquote><p>基于公开数据训基础模型、以卖Token为商业模式的公司会很卷,卷成电费</p><p>有私有Data的公司能产生差异化价值</p><p>现实世界的数值游戏</p><p>新产品“DeepMiner”的由来</p><p>Agent或Tool Use在企业服务领域产生了新的链接</p><p>Agent是一种交互技术,对To C和To B互联网都会产生革命性变化</p><p>那些不提供供给侧能力、只提供链接网络,而这个网络又不是根结点的公司,会很危险</p><p>将来企业只有两类人?老板和合伙人(合伙人不是公司员工)</p><p>一个幸福的老板,个人使命、家庭使命和公司使命高度相关</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
对OpenAI姚顺雨三小时访谈:六年Agent研究、人与系统、吞噬的边界、既单极又多元的世界。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-09-11 10:03
Today's guest, we are happy to invite OpenAI researcher Yao Shunyu. In April 2025, **Yao Shunyu published a famous blog post "The Second Half"**, announcing that the AI main thread game has entered the second half. After that, we had a podcast conversation with him. Yao Shunyu graduated from Tsinghua and Princeton University, and started researching agents very early. During his PhD, he realized that language may be the closest tool to essence invented by humans, so he turned to language agent research and has been doing it for 6 years. He has many representative works. **Our conversation starts from the individual and jointly explores the boundaries of world intelligence and the panorama of humans and machines, reached by people, organizations, AI, and human-machine interaction.** Not long ago, I just founded a new content studio "Language is World Studio". Shunyu unexpectedly helped me answer the original intention of our studio from another perspective. Why do we believe that language is the essential mystery of this world? His expression is: **"Language is a tool invented by humans to achieve generalization, which is more essential than other things."** (This interview took place in May 2025. The interview represents personal views and is not related to the company where he works.) > **02:58 Part 1: People** > * I feel that the first 28 years of my life were very well-behaved > * I have always had this non-consensus: I want to be an Agent > * The biggest gain in the first year is to use GPT, not BERT; the second learning is that tasks or environment are very important > * My research has two cores: one is how to do some valuable tasks and environments that are more relevant to the real world; the other is how to do some simple but general methods > **17:50 Part 2: System** > * Agent is a very old concept. Any system that can make its own decisions, interact with the environment, and try to optimize rewards can be called an Agent > * Three ups and downs in the evolution of Agent: everyone may pay more attention to the method line and easily ignore the task line, but these two lines are complementary > * The two most critical directions for Agent development: one is to let it have its own reward and be able to explore on its own; the other is Multi-Agent, so that they can form an organizational structure between them > * Code is a bit like a human hand, it is AI's most important *affordance* > * Task setting > * Generalized tools > * Reward mechanism > **48:38 Part 3: Devouring Boundaries** > * The biggest opportunity for startups is: to design different interfaces > * It is possible that the model's capabilities will produce interaction methods beyond ChatGPT and become a Super App > * Owning a Super App is a double-edged sword for a company. When you have a Super App like ChatGPT, naturally your research will revolve around this Super App > * Assistant, Her, or human-like interaction is obviously one of the most important interaction methods; what is not obvious is, can I base it on non-human-like interaction? > * This world is a relationship of mutual copying, not a one-way copying relationship > * OpenAI may become a company similar to Google, becoming a very important part of the new world, but this does not mean that the world will be monopolized by such a unipolar system > * The ultimate intelligent boundary is determined by different interaction methods, not by a single model > * The winter before last, I read a book written by Von Neumann before his death: The Computer and the Brain > * The environment is always the outermost part of the memory hierarchy, which is very philosophical > * The Chatbot system of a model company will evolve into a very natural Agent system > **01:05:01 Part 4: The Global of Humanity** > * Human and System: Should Agent be like a human? "It's a utility problem" > * OpenAI is a bottom-up company > * If you don't have a different bet, it's hard to surpass the previous overlord > * My mentor is the second author of GPT‑1. He stayed at OpenAI for a year, and he was a bit skeptical about this > * If you become the CEO of Berkshire and want to allocate 50 billion US dollars to the AGI industry in the future, how would you allocate this money? > * The real danger is not that something similar to WeChat defeats WeChat, but that something different defeats WeChat > * It happens that in this era, it is better to do things with higher ceilings 【More Information】 Text version launched simultaneously For the text version, please go to the official account: Language is World language is world
Original title: 115. 对OpenAI姚顺雨3小时访谈:6年Agent研究、人与系统、吞噬的边界、既单极又多元的世界
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾,我们很开心邀请了OpenAI研究员姚顺雨。</p><p>2025年4月,<strong>姚顺雨发布了一篇有名的博文《The Second Half》</strong>,宣告AI主线程的游戏已进入下半场。这之后,我们与他进行了一场播客对谈。</p><p>姚顺雨毕业于清华和普林斯顿大学,开始智能体的研究非常早。在博士期间他意识到语言可能是人类发明的最接近本质的工具,于是转向语言智能体研究,至今已6年。他有许多有代表性的工作。</p><p><strong>我们的谈话从个体出发,共同探索由人、组织、AI、人与机器的交互,所抵达的这个世界智能的边界以及人类与机器的全景。</strong></p><p>前不久,我刚刚创立了一家新的内容工作室「语言即世界工作室」,顺雨很意外地从另一个角度帮我回答了,我们工作室创立的初心。</p><p>为什么我们相信语言是这个世界的本质奥秘?他的表达是:<strong>“语言是人为了实现泛化而发明出来的工具,这一点比其他东西更本质。”</strong></p><p>(本次访谈发生在2025年5月,访谈为个人观点,与所供职公司无关。)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FvcGJiysXOtY4fIP_aDHS3_iV-t7.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>02:58 第一部分:人</strong></blockquote><ul> <li>感觉我前28年的人生,非常的乖</li> <li>我一直有这个非共识:我想要去做Agent</li> <li>第一年最大收获是,要用GPT,不要用BERT;第二个learning是任务或环境非常重要</li> <li>我的研究有两个核心:一是怎么去做一些有价值、和现实世界更相关的任务和环境;二是怎么去做一些简单、但又通用的方法</li></ul><blockquote><strong>17:50 第二部分:系统</strong></blockquote><ul> <li>Agent是一个非常古老的概念,任何能进行自我决策、与环境交互,并试图优化奖励的系统,都可以被称为Agent</li> <li>Agent演变的三波兴衰:大家可能更多注意到方法线,容易忽视任务线,但这两条线是相辅相成的</li> <li>Agent发展最关键的两个方向:一个是让它拥有自己的reward(奖励),能自己探索;另一个是Multi-Agent(多智能体),让它们之间能形成组织结构</li> <li>Code有点像人的手,它是AI最重要的<em>affordance</em>(环境给予行动者的可能性)</li> <li>任务的设定</li> <li>泛化的工具</li> <li>奖励的机制</li></ul><blockquote><strong>48:38 第三部分:吞噬的边界</strong></blockquote><ul> <li>创业公司最大机会是:能设计不同的interface(交互方式)</li> <li>可能模型的能力会产生beyond ChatGPT(超越 ChatGPT)的交互方式,变成Super App</li> <li>拥有一个Super App对于公司是双刃剑,当你有像ChatGPT这样的Super App,很自然你的研究就会围绕这个Super App</li> <li>Assistant、Her,或者像人一样的交互方式,显然是最重要的交互方式之一;不显然的是,我能不能基于不像人的交互方式?</li> <li>这世界是个相互抄的关系,而不是一个单向抄的关系</li> <li>OpenAI可能会成为一个类似Google的公司,成为新世界里非常重要的一环,但这并不代表,这个世界就会被这样一个单极系统垄断</li> <li>最终的智能边界,是由不同的交互方式决定的,而不是由一个single model(单一模型)决定</li> <li>前年冬天,我读到冯诺依曼临终前写的一本书:The Computer and the Brain</li> <li>环境永远是记忆层级中最外层的部分,这很哲学</li> <li>模型公司的Chatbot系统会演化成一个很自然的Agent系统</li></ul><blockquote><strong>01:05:01 第四部分:人类的全局</strong></blockquote><ul> <li>人与系统:Agent要不要像人?“是一个效用问题”</li> <li>OpenAI是一个bottom-up(自下而上)的公司</li> <li>如果你没有一个different bet(不同的下注方向),很难超越前面的霸主</li> <li>我导师是GPT‑1第二作者,他在OpenAI待了一年,他对这件事是有点怀疑的</li> <li>如果你成为了伯克希尔的CEO,未来要拿出500亿美金allocate(分配)到AGI行业,你会怎么allocate这笔钱?</li> <li>真正的危险,不是一个类似微信的东西打败了微信,而是一个不一样的东西打败了微信</li> <li>恰好这个时代,做上限更高的事更好</li></ul><p>【更多信息】</p><p>文字版同步上线</p><p>文字版请前往公众号:语言即世界language is world</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
与殷一、欧迪聊聊萨洛蒙:外资品牌入华、小众越野跑和少女故事From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-09-06 05:06
Today's guests are Yin Yi, General Manager of Salomon China, and Oudi, Head of Fashion and Trend Industry at Xiaohongshu Business. Let's talk about Salomon, a popular outdoor brand in the past two years. Salomon and Arc'teryx both belong to Amer Sports Group. In 2019, Amer Sports was acquired by Anta. **After 2021, Salomon, a 70-year-old French brand, unexpectedly started its growth path in China.** This niche brand, which started with skiing and gradually expanded to trail running shoes, mainly targeted men and professional skiing and trail running enthusiasts in China's traditional consumer groups. **However, in recent years, through a series of brand activities on Xiaohongshu, they have successfully attracted female consumer groups and new outdoor enthusiasts, expanding their circle and growing, which in turn has further stimulated the growth of male consumers and core sports enthusiasts.** I hope this fresh brand knowledge can also bring you new inspiration :) 02:00 Self-introduction of the two guests 03:06 Salomon was born in France in 1947, and snow is the deepest imprint in its DNA 04:39 We were once acquired by Adidas, and Adidas helped us with sports style 06:04 The core group of trail runners was only 100,000 ten years ago, and it is still 100,000 this year. What's the difference? 11:52 Should a brand go from niche to mass, or from mass to niche? 16:22 What happened to Salomon after its parent company Amer Sports was acquired by Anta in 2019? 18:07 The proportion of Chinese women is the highest, nearly 70%, and now it is less than 60% 20:45 The pull of women on new male consumers is higher than the pull of men on new female consumers 23:21 After 2021, more and more overseas outdoor brands actively and centrally enter China 27:31 Xiaohongshu helps Salomon expand its audience: "New Product Tasting" and "Color Sensitive Control" 34:55 A people-oriented brand strategy: find "super user representatives" 43:26 10 years ago, we focused more on the winning moments when building a brand, but now we focus more on the process and details of growth 45:37 Consumer insights behind Salomon girls: women no longer pursue the accumulation of rituals, but pursue inner relaxation 48:36 Combining Xiaohongshu and Anfu Road Salomon stores, online and offline circulating traffic 55:24 Salomon's new female consumers have in turn fueled the growth of male consumers 58:16 If a very masculine brand wants to become feminine, what should it do? 01:00:43 Will trendiness weaken the professional outdoor gene? 01:01:33 New changes in young people's consumption 01:08:05 When building an AI brand like a consumer brand, give some advice to AI founders from a brand perspective Share some beautiful recording scenes:
Original title: 114. 与殷一、欧迪聊聊萨洛蒙:中国意外的增长阀门、小众越野跑与少女故事
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是萨洛蒙中国区总经理殷一和小红书商业服饰潮流行业负责人欧迪。我们一起来聊聊,这两年比较火的一个户外品牌,萨洛蒙。</p><p>萨洛蒙和始祖鸟都属于亚玛芬集团,2019年亚玛芬被安踏收购;<strong>2021年以后,萨洛蒙这个70多岁的法国品牌,意外在中国开启了增长之路。</strong></p><p>这个最早从滑雪品类起步,逐步扩展到越野跑鞋的小众品牌,在中国的传统消费群体以男性以及专业滑雪、越野跑爱好者为主;<strong>但近几年,他们通过在小红书的一系列品牌行为,成功吸引女性消费群体和新户外人群,扩圈增长,而这又进一步反向刺激了男性消费者以及核心运动人群的增长。</strong></p><p>希望这些新鲜的品牌知识,也能给你带来新的启发:)</p><figure><img src="https://image.xyzcdn.net/FpF2Pa7IhWLZZ4V2h-jx0vpA2lO3.JPG" /></figure><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FlXX4H8P1BRETUdBFboiX8lsX2LW.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>02:00 两位嘉宾的自我介绍</p><p>03:06 萨洛蒙1947年诞生于法国,雪是DNA里最深的那道烙印</p><p>04:39 我们曾经被阿迪达斯收购,阿迪达斯帮我们做了sports style</p><p>06:04 越野跑核心人群,十年前只有10万人,今年也是10万人,区别是什么?</p><p>11:52 品牌应该从小众走向大众,还是从大众走向小众?</p><p>16:22 2019年母公司亚玛芬被安踏收购后,萨洛蒙发生了什么?</p><p>18:07 中国女性占比最高接近七成,现在是六成不到</p><p>20:45 女性对男性的拉新高于男性对女性的拉新</p><p>23:21 2021年以后,越来越多海外户外品牌主动地集中进入中国</p><p>27:31 小红书帮萨洛蒙拓展人群:“尖货尝新档”和“色彩敏感控”</p><p>34:55 以人为主体的品牌策略:找到“超级用户代表”</p><p>43:26 10年前我们做品牌会更注重the winning moments,现在更注重成长的过程和细节</p><p>45:37 萨门少女背后的消费者洞察:女性不再追求仪式感的堆叠,更追求内心的松弛</p><p>48:36 结合小红书和安福路萨洛蒙门店,线上和线下循环流量</p><p>55:24 萨洛蒙拉新女性消费者,又反哺了男性消费者的增长</p><p>58:16 如果一个非常男性化的品牌想要女性化,应该怎么做?</p><p>01:00:43 潮流化会不会削弱专业户外基因?</p><p>01:01:33 年轻人消费新变化</p><p>01:08:05 当做AI品牌也像做消费品品牌,从品牌角度给AI创始人一些建议</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>分享一下漂亮的录制现场:</p><figure><img src="https://image.xyzcdn.net/FiR5vV7vFaetkgUKSerD2ObyM9OU.JPG" /></figure><figure><img src="https://image.xyzcdn.net/FluYEhAf3d_kt4XUtAeWF70FwfD9.JPG" /></figure><figure><img src="https://image.xyzcdn.net/Ft6EYq6Tq8wkGBgVlfS53UFAM_xV.JPG" /></figure><figure><img src="https://image.xyzcdn.net/FgxeH7ZoI7n3H51F-ATg_o6TyhqN.JPG" /></figure><figure><img src="https://image.xyzcdn.net/FmOagFuM0Bhs9Bh1ykS6xW1FgXfg.JPG" /></figure><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
与杨植麟时隔一年的对话:K2、Agentic LLM、缸中之脑和“站在无限的开端”From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-08-27 04:21
Today's guest is Yang Zhilin, founder and CEO of Moonshot AI, last on our show ("Business Interview") 1.5 years ago. This past July, the Kimi K2 model was released, drawing widespread attention. K2 is an open-source programming and Agentic large language model based on MoE architecture. Figuratively, the model uses programming to escape the closed "brain in a vat," growing "hands" to manipulate the external digital world. **Today, I talked with Yang Zhilin about K2's R&D, his current technical understanding, and technical judgments.** And, **as a founder, his feelings and thoughts during the past year's public opinion storms and entrepreneurial ups and downs.** 01:49 **An Infinite Mountain** It's like a book I'm reading: The Beginning of Infinity. Maybe one day we'll find this snow mountain has no end, I hope it never ends. But it's still a "brain in a vat": Imagine a fish tank, you put a brain in it, with no connection to the outside world. Whether it's reinforcement learning based on long thinking, or Agent's reinforcement learning, they point to the same thing: test-time scaling. Another interesting trend is that more model companies are now making "one-party Agent products." L1 to L5 are not necessarily serial relationships. Claude bets on this: it doesn't do much in Reasoning, but it does very well in Agent. Only when the model participates in the development process can the real Innovator (L4) stage be unlocked. 24:58 **K2 is K2 (Mount Godwin-Austen)** K2's key points: First, we want it to be a very good base model. We want to maximize the use of every piece of data, the so-called token efficiency - feeding the same amount of data, the "brain" grows more. We will do a lot of Rephrase operations on the data. We are very concerned about the Muon optimizer, which greatly improves token efficiency. Second, we want K2 to have good Agentic capabilities. For Agentic models, the biggest challenge is model generalization. It may be a transformation from a "brain in a vat" to being able to interact with the world, because the most important feature of an Agent is that it can use tools in multiple rounds. Humans are the so-called universal constructor. There is a potential idea that AI needs to be trained in a more AI native way. Muon will explode when you train it. 54:08 **A Simple and Complex System** Why did Kimi switch from closed source to open source? The model training is complete, and the product is basically complete. Making interaction improvements is certainly valuable, but that's the icing on the cake. It's already good if multi-modality doesn't damage the "brain." The multi-modality you learn may be a "dumb multi-modality," we want it to be a "smart multi-modality." Scaling Law has encountered a data wall, which is an objective fact. The data flywheel is very dependent on external feedback. We don't want the feedback to have a lot of noise, but we haven't solved this problem very well now. It now seems that scaling based on FLOPs is a more effective path, but when will this balance change? Many Long Context architectures affect "intelligence." Pure Linear Attention may affect intelligence because this architecture has some bias. Where are the long-term boundaries between base model companies and application companies that make Agent products? How to think about the business model today? Is API a good business? Can Kimi make money? 01:25:05 **In Your Own Story** Tim (Zhou Xinyu) tells me every day - manage with RL, not SFT. The biggest problem with managing a team with RL is that you are easily hacked. A lot of complexity is artificially added, it's not that complicated in reality. You can only say that you are in your own story - you constantly feel what kind of person you are, why you want to do this thing. I also asked Kimi this question, and he said that AI is the "amplifier of human civilization." This is also what Kimi told me - any intermediate state may become the object of criticism. There is definitely fear, but it's more important to focus on what you can do in the current step. - Thinking about this question is more important. 2024 Interview with Yang Zhilin: 《Chatting with Yang Zhilin about the past year of large model entrepreneurship: the increment of human ideals, probabilistic non-consensus, and Sora》 [More Information] Text and video versions are launched simultaneously. For the text version, please go to the official account: language is world For the video version, please go to Bilibili: Zhang Xiaojun Business Interview
Original title: 113. 和杨植麟时隔1年的对话:K2、Agentic LLM、缸中之脑和“站在无限的开端”
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是月之暗面创始人兼CEO杨植麟,距离他上一次来我们的节目(《商业访谈录》59集)已经过去1年半。</p><p>就在刚刚过去的7月,Kimi K2模型发布,引发了比较广泛的关注。K2是一个基于MoE架构的开源编程和Agentic大语言模型。形象来说,模型借助编程能力走出封闭的“缸中之脑”,长出了“手”,开始操控外部数字世界。</p><p><strong>今天这集节目我和杨植麟聊了聊K2的研发和他当下的技术认知、技术判断。</strong></p><p>以及,<strong>在过去一年的舆论风暴与创业起伏中,作为创始人,他的心情与思考。</strong></p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fm-P6G2K2Gz6s45f9Xb_NdNasujW.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote>01:49 <strong>一座无限的山</strong></blockquote><p>这有点像我最近在看的一本书:The Beginning of Infinity(无穷的开始)</p><p>也许有一天会发现,这座雪山没有尽头,我希望它一直没有尽头</p><p>但它还是一个“缸中之脑”:想象一个鱼缸,你把一个脑子放在里面,跟外界没有联系</p><p>不管是基于长思考的强化学习,还是Agent的强化学习,都指向同一个东西:test-time scaling(测试时扩展)</p><p>还有一个很有意思的趋势是,现在有更多模型公司去做“一方的Agent产品”</p><p>L1到L5不一定是串行关系,Claude就bet这一点:它在Reasoning上做得不是特别多,但在Agent上做得非常好</p><p>只有当模型参与到开发过程,才能解锁真正的Innovator(L4)阶段</p><blockquote>24:58 <strong>K2是乔戈里峰</strong></blockquote><p>K2的重点有几个:一,我们希望它是一个非常好的基础模型</p><p>我们希望能最大化使用每一份数据,就是所谓token efficiency——喂一样多的数据,“脑子”长得更多</p><p>我们会对数据做很多Rephrase(改写)操作</p><p>我们很关注Muon优化器,它对token efficiency提升很大</p><p>二,我们希望K2有好的Agentic能力,对于Agentic模型来讲,最大挑战是模型的泛化</p><p>它可能是一个从“缸中之脑”变成可以跟世界交互,因为所谓Agent最重要的特征是,可以多轮地使用工具</p><p>人是所谓的universal constructor(万能构造器)</p><p>有一种潜在思路,需要用更AI native(原生人工智能)的方式去训练AI</p><p>Muon你去训的时候,它会炸</p><blockquote>54:08 <strong>既简单又复杂的系统</strong></blockquote><p>为什么Kimi从闭源转向开源?</p><p>模型训练完成,产品也基本完成了,做交互上的改进当然有价值,但那是锦上添花的一步</p><p>多模态不损伤“脑子”已经很好了</p><p>你可能学出来的多模态是个“傻的多模态”,我们希望它是个“聪明的多模态”</p><p>Scaling Law遇到数据墙了,这是客观事实</p><p>数据飞轮很依赖外部环境的feedback(反馈),我们不希望feedback有很多噪声,但现在没有把这个问题解决得非常好</p><p>现在看起来,基于FLOPs的scaling是更有效路径,但这个平衡什么时候会发生变化?</p><p>很多Long Context架构会影响“智商”</p><p>纯粹的Linear Attention(线性注意力机制)可能影响智商,因为这个架构会有一些bias(偏差)</p><p>基座模型公司和做Agent产品的应用公司,长期看边界在哪?</p><p>今天怎么思考商业模式?API是好生意吗?</p><p>Kimi能赚钱吗?</p><blockquote>01:25:05 <strong>在自己的故事里面</strong></blockquote><p>Tim(周昕宇)天天跟我讲——要用RL的方式去管理,而不是用SFT</p><p>用RL管理团队最大问题是,你容易被hack</p><p>很多复杂性都是人为强行加上去的,实际并没有那么复杂</p><p>只能说是在自己的这个故事里面——你不断地感受自己到底是什么样的一个人,你为什么要做这个事情</p><p>这个问题我也问过Kimi,他说,AI是“人类文明的放大器”</p><p>这也是Kimi跟我讲的——任何中间状态都有可能成为被批评的对象</p><p>肯定有恐惧,更多要关注你当前这一步,能做什么?——想这个问题更重要</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>2024年对杨植麟的访谈:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/65e16b5b6144a933b1d968b5">《和杨植麟聊大模型创业这一年:人类理想的增量、有概率的非共识和Sora》</a></p><p>【更多信息】</p><p>文字和视频版同步上线</p><p>文字版请前往公众号:语言即世界language is world</p><p>视频版请前往Bilibili:张小珺商业访谈录</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
112. He Guangmi Large Model Quarterly Report: Differentiation and Convergence, All-in-One and Vertical Integration, L4 Experience and Mining WindowFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-08-18 23:00
The new episode of the "Global Large Model Quarterly Report" is finally here, under everyone's strong urging. This episode has two keywords. **The first keyword is differentiation.** In this quarter, various model companies in Silicon Valley began to differentiate into various fields. Apart from Google Gemini and OpenAI still making general-purpose models, Anthropic differentiated into Coding and Agentic model capabilities; Mira's Thinking Machines differentiated into multi-modality and next-generation interaction. **The second keyword is product.** In the past, the "Large Model Quarterly Report" has focused on the intelligent exploration of models, but Guangmi has begun to discuss products in detail, which is a first. This is the 7th episode of the "Global Large Model Quarterly Report". If you like our series, we hope you will give us more encouragement and support. **Your praise is very important to us.** 2025, looking forward to our common progress with AI! > **03:54 Model Differentiation** General-purpose models with various capabilities - Gemini/OpenAI All in Coding+Agentic capabilities - Anthropic Multi-modal native - Thinking Machines Lab Grok is still exploring its ecological position today Meta's original 0-1 gene is still very weak The most leading companies are very similar to F1 competitions > **21:37 Horizontal Full-Family Bucket, Vertical Integration** C-end is a very obvious head convergence trend, ChatGPT may converge many products at the C-end As an investor or AI entrepreneur, one side is excited that the technology is progressing every month, and the other side is a bit desperate An example of a horizontal full-family bucket is ChatGPT, which already includes Chat+Search+Coding+Agent+WorkSpace An example of vertical integration is Gemini, from TPU chips, to Gemini models, to Agent applications on top, to Google Docs/Chrome browser/Android operating system/YouTube videos, which can do super integration > **33:35 Intelligence and Product are Both Important** In the past 3 years, we have been extremely obsessed with exploring the upper limit of intelligence, but in the past two months, we have begun to pay attention to products ChatGPT has many non-technical barriers, while Coding or model companies only have technical barriers OpenAI is the best balanced company, exploring the upper limit of intelligence while also transforming intelligent dividends into product traffic and brand awareness > **38:52 Making AI Products is Like Mining, the Preservation Window is Key** Mining: The first experience that amazes users is very important, even if the token consumption is very large, as long as you are the first to create Magic moments that amaze users, it is equivalent to you have received at least 500 million US dollars in marketing expenses, such as Perplexity/Cursor/Manus But this window period is particularly interesting, the window is gradually shortening: from 2 years, 1 year, 3 months Can product companies win over products made by model companies? > **44:21 L4 Level Experience** The two best Agents both have L4 experience: ChatGPT's Deep Research + Anthropic's Claude Code, corresponding to information search + software development respectively Today's biggest dividend is still language/code dividend, especially code, not multi-modal/world model/robot Claude Code has been killing it recently, Claude Code is an L4 experience What other areas will have L4 level experience next? > **52:43 Change in View of Google** One guess is that ChatGPT will definitely make an advertising platform in the future, because it recently hired a new commercialization CEO But I think Google is still the best advertising platform in the world. In the end, everyone's product forms will converge to the same goal, and what is integrated together is the full-family bucket logic, and Search will also evolve > **55:53 Other Topics** Is there a bubble in AGI? If there is a bubble in AGI, what will be the trigger to burst the bubble? What is the difference in intelligence level between humans and gorillas? What new topics have been discussed more recently in the Bay Area? **"Jewish finance, Chinese AGI"** [Global Large Model Quarterly Report] Series 2023: Oral Global Large Model This Year: Human Billion-Dollar Scientific Gamble and Uneven Sino-US Landscape 2024 Q1: Chatting with Guangmi about the AGI Large Infrastructure Era: Electricity + Chip = Output Intelligence 2024 Q2: Oral Global Large Model This Half Year: Perplexity's Sudden Popularity and the AI Application Ecosystem That Has Not Yet Exploded 2024 Q3: AGI Paradigm Shift: Predicting Strawberries, OpenAI o1 and Self-Play RL with Guangmi 2024 Q4: Large Model Quarterly Report Year-End Special: Predicting the Way LLM Products Surpass Google with Guangmi 2025 Q1: Large Model Quarterly Report: Chatting with Guangmi about the Biggest Non-Consensus at the Moment, the Main Line and Main Peak of AGI
Original title: 112. 和广密聊大模型季报:分化与收敛、全家桶与垂直整合、L4体验与挖矿窗口
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>在大家的强烈催更下,新一集的《全球大模型季报》终于来了。</p><p>这一集有两个关键词。</p><p><strong>第一个关键词是分化。</strong>硅谷各个模型公司在这个季度,开始分化到各个领域,除了Google Gemini和OpenAI还在做通用的模型;Anthropic分化到Coding、Agentic的模型能力;Mira的Thinking Machines分化到多模态和下一代交互。</p><p><strong>第二个关键词是产品。</strong>《大模型季报》过去一直把视角放在模型的智能探索上,而广密开始浓墨重彩地聊产品,这还是第一次。</p><p>这里是《全球大模型季报》的第7集,如果大家喜欢我们的系列,希望大家多多给我们一些鼓励和支持。<strong>你们的夸奖对我们来说,非常的重要。</strong></p><p>2025,期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FhO5L2PKPE8OSwyJDGz5_NmS_Zfh.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p><strong>03:54 模型在分化</strong></p></blockquote><p>通用各项能力的模型 - Gemini/OpenAI</p><p>All in Coding+Agentic 能力 - Anthropic</p><p>多模态原生 - Thinking Machines Lab</p><p>Grok 今天还在摸索自己生态位置</p><p>Meta 原创 0-1 的基因还是很弱</p><p>最领先的这几家很像 F1 竞赛</p><blockquote><p><strong>21:37 横向全家桶,纵向垂直整合</strong></p></blockquote><p>C端是一个非常明显的头部收敛趋势,ChatGPT可能在C端会收敛掉很多产品</p><p>作为投资人或 AI 创业者,一面兴奋是技术每个月都在进步,另一面有点绝望</p><p>横向全家桶的例子是ChatGPT,已经包含了Chat+搜索+Coding+Agent+WorkSpace</p><p>纵向垂直整合的例子是 Gemini,从 TPU 芯片,到 Gemini 模型,到上面 Agent 应用,再到 Google 文档/Chrome浏览器/安卓操作系统/YouTube视频,可以做超级集成</p><blockquote><p><strong>33:35 智能和产品都重要</strong></p></blockquote><p>过去 3 年一直是对智能上限的探索极度上头,但在过去两个月开始重视产品了</p><p>ChatGPT 身上有很多非技术性壁垒,而 Coding 或模型公司只是技术壁垒</p><p>OpenAI 是平衡最好的一家,一边探索智能上限,一边又把智能红利转化成产品流量和品牌心智</p><blockquote><p><strong>38:52 做 AI 产品很像挖矿,保鲜窗口很关键</strong></p></blockquote><p>挖矿:第一个做出来让用户惊叹的体验很重要,哪怕 token 消耗很大,只要你是第一个做出来让用户惊叹的 Magic moments,就等于你起码得到了 5 亿美金的营销费用,比如 Perplexity/Cursor/Manus</p><p>但这个窗口期又特别有意思,窗口是逐渐在缩短的:从 2 年、1 年、3 个月</p><p>产品公司能赢过模型公司做的产品吗?</p><blockquote><p><strong>44:21 L4 级别的体验</strong></p></blockquote><p>最优秀的俩 Agent 都有了 L4 体验:ChatGPT 的 Deep Research + Anthropic 的 Claude Code,分别对应信息搜索+软件开发</p><p>今天最大红利还是 language/code 红利,尤其是 code,还不是多模态/世界模型/机器人</p><p>Claude Code 最近大杀四方,Claude Code 是一个 L4 的体验</p><p>接下来还有哪些领域能有 L4 级别体验?</p><blockquote><p><strong>52:43 对Google看法的转变</strong></p></blockquote><p>一个猜想是,ChatGPT 后面肯定会做广告平台,因为最近招了新的商业化 CEO</p><p>但我在想 Google 还是全球最好的广告平台,最后大家产品形态上都会殊途同归,融合到一起的,就是全家桶逻辑,Search 也会演变</p><blockquote><p><strong>55:53 其他话题</strong></p></blockquote><p>AGI有泡沫吗?假如AGI有泡沫,什么事情会是导火索,戳破泡沫?</p><p>人类和大猩猩的智能水平差异在哪?</p><p>最近湾区有没有什么新的讨论比较高的话题?</p><p><strong>“犹太人的金融,华人的AGI”</strong></p><p><strong>(免责声明:本节目不构成投资建议)</strong></p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【全球大模型季报】系列</p><p>2023年:<a href="https://www.xiaoyuzhoufm.com/episodes/65910adb991e2ee60880f151" rel="noopener noreferrer nofollow" target="_blank">口述全球大模型这一年:人类千亿科学豪赌与参差的中美景观</a></p><p>2024年Q1:<a href="https://www.xiaoyuzhoufm.com/episodes/661f21075dae7932c6f821d8" rel="noopener noreferrer nofollow" target="_blank">和广密聊AGI大基建时代:电+芯片=产出智能</a></p><p>2024年Q2:<a href="https://www.xiaoyuzhoufm.com/episodes/667774b3b6a84127299efd5a" rel="noopener noreferrer nofollow" target="_blank">口述全球大模型这半年:Perplexity突然火爆和尚未爆发的AI应用生态</a></p><p>2024年Q3:<a href="https://www.xiaoyuzhoufm.com/episodes/66d866f0f39a2201c069dccb" rel="noopener noreferrer nofollow" target="_blank">AGI范式大转移:和广密预言草莓、OpenAI o1和self-play RL</a></p><p>2024年Q4:<a href="https://www.xiaoyuzhoufm.com/episodes/6766a52a15a5fd520e6c86a9" rel="noopener noreferrer nofollow" target="_blank">大模型季报年终特辑:和广密预言LLM产品超越Google之路</a></p><p>2025年Q1:<a href="https://www.xiaoyuzhoufm.com/episodes/67e9614b8eecdbeb601ac5fe" rel="noopener noreferrer nofollow" target="_blank">大模型季报:和广密聊当下最大非共识、AGI的主线与主峰</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
李一帆口述激光雷达十一年创业史:你仔细想想行业的机会来自哪里?是国家、民族的机会。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-08-07 23:00
Over the past 10 years, China's new energy vehicle industry has grown from scratch and experienced vigorous development. The most familiar brands may be Ideal, Xpeng, and Nio, but on the other hand, the industrial chain companies behind this transformation are also changing. Episode 108 of "Business Interviews" and this episode's 3-hour interview with Li Yifan, co-founder and CEO of Hesai, **focuses on the invisible players in the automotive industry chain.** This episode is also Li Yifan's oral history of their 11 years of hardcore technology entrepreneurship in LiDAR. As China's technological innovation shifts from internet-based model innovation to cutting-edge hardcore technology innovation, China may see more technology-based entrepreneurs. Hesai's story may provide a reference sample. (This interview was recorded in April 2025) 00:02:00 Quick questions and answers begin 00:02:33 Stock price rollercoaster 00:03:40 99.5% cost reduction of LiDAR 00:12:05 Family and growth 00:32:13 Rare equal shareholding by 3 people 00:43:35 Financing tricks 00:49:02 First large order of 20 million 00:55:45 Thought it was over... 01:10:06 Yu Kai added a 0 more than me 01:20:47 Pricing considerations 01:38:15 Starting to defect 01:58:07 Entering the automotive headquarters 02:38:34 New money and old money 03:02:16 Final quick questions and answers [From Steam Engine to Autonomous Driving] Series 3-Hour Interview with Li Xiang (Podcast Version): Otaku, AI, Family, Games, and Ladder Chatting with He Xiaopeng, FSD, "Swimming in a Sea of Blood", Heroes and Dogs in Troubled Times Dialogue with Ola Källenius, Global CEO of Mercedes-Benz: A CEO in Transition and a 139-Year-Old Mercedes-Benz in Transition Yu Kai's 30-Year Oral History: The World is More Than Swords and Shadows, It's a Jianghu Story of People Coming and Going Chatting with Lou Tiancheng about Robotaxi and ACRush: "The Better L2 is done, the further away from L4"
Original title: 111. 李一帆口述激光雷达11年创业史:你仔细想行业的机会来自哪?是国家、民族的机会
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>过去10年,中国新能源汽车产业从无到有,经历蓬勃发展。大家最熟悉的可能是理想、小鹏、蔚来这些整车品牌,但另一面这场变革背后的产业链企业也在变化。</p><p>《商业访谈录》的108集对余凯和本集对禾赛联合创始人和CEO李一帆的3小时访谈,<strong>关注的都是汽车产业链上的隐形选手。</strong></p><p><strong>这集也是李一帆对他们做激光雷达11年硬核科技创业的一部口述史。</strong></p><p>随着中国科技创新从互联网的模式创新,走向硬核科技的前沿创新,中国也许还会出现更多的技术型创业者。禾赛的故事也许能提供一个参考样本。</p><p>(本次访谈录制于2025年4月)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FpUcziKADuUvw4xnFBg4eruuRPSY.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>00:02:00 开始的快问快答</p><p>00:02:33 股价过山车</p><p>00:03:40 激光雷达99.5%的降本</p><p>00:12:05 家庭和成长</p><p>00:32:13 罕见的3人平分股份</p><p>00:43:35 融资的伎俩</p><p>00:49:02 第一笔2000万大单</p><p>00:55:45 想说完蛋了…</p><p>01:10:06 余凯比多我一个0</p><p>01:20:47 定价心思</p><p>01:38:15 开始倒戈</p><p>01:58:07 进入汽车大本营</p><p>02:38:34 新钱和老钱</p><p>03:02:16 最后的快问快答</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【从蒸汽机到无人驾驶】系列</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67769bd815a5fd520e8fa318">《对李想的3小时访谈(播客版):宅男、AI、家庭、游戏和天梯》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/6695032837236c546e4c2e0f">《和何小鹏聊,FSD、“在血海游泳”、乱世中的英雄与狗熊》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68300e93fcbc2e206b58eb2b">《对话奔驰全球CEO康林松:转型期CEO和转型之中的139岁奔驰》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/686b8c0560f8f77d404338cd">《余凯口述30年史:世界不止刀光剑影,是一部人来人往的江湖故事》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/66bdb98233591c27be49e931">《和楼天城聊聊Robotaxi和ACRush:“L2做得越厉害,离L4越远”》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
逐段解读Kimi K2报告,对比ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-07-30 08:42
**We're reading papers again!!!** Today we're reading some of the most noteworthy technical reports from recent weeks: the technical reports for **Kimi K2, ChatGPT Agent, Qwen3-Coder, and a blog post from Manus.** They are related because all these contents are related to Agent. Today's guest is Zheng Boyuan, a Ph.D. student at The Ohio State University, whose research direction is Language Agent. He will lead us to read the above technical reports and blog posts together. This is the **"Beauty of Technology" series** of "Business Interview Record". We look forward to reading papers with you, appreciating technological equality, and experiencing the beauty of technology - being your cyber group meeting :) 00:02:00 Defining and classifying Agent 00:14:50 Comparison of technical routes of Kimi K2, ChatGPT Agent, Qwen3-Coder, and Manus 00:19:05 Why is there overall disappointment with ChatGPT Agent? 00:28:29 Key aspects of Agent Training: synthetic data, reinforcement learning, safety 00:30:57 **First technical report: Kimi K2: Open Agentic Intelligence** [github.com](https://github.com/MoonshotAI/Kimi-K2/blob/main/tech_report.pdf) 00:43:50 **Second technical report and interview: Introducing ChatGPT agent: bridging research and action** [openai.com](https://openai.com/zh-Hans-CN/index/introducing-chatgpt-agent/) **Sequoia Interview OpenAI: OpenAI Just Released ChatGPT Agent, Its Most Powerful Agent Yet** [www.sequoiacap.com](https://www.sequoiacap.com/podcast/training-data-chatgpt-agent/) 01:53:38 **Third technical report: Qwen3-Coder: Agentic Coding in the World** [qwenlm.github.io](https://qwenlm.github.io/blog/qwen3-coder/) 01:59:04 **Fourth technical blog post: Context Engineering for AI Agents: Lessons from Building Manus (Author: Yichao 'Peak' Ji)** [manus.im](https://manus.im/zh-cn/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus) 02:06:06 Outlook: Maybe there will be a new paradigm 02:15:20 I feel that Agent is "my extended brain", and I have a "legion" (Family of Agents) behind me 02:16:41 Different Bot language styles: DeepSeek is foul-mouthed, Yuanbao is a bootlicker > **Agent Definition** An Agent is an intelligent system capable of interacting with the environment. It has two basic capabilities: **Perception** Ability to observe the state of the environment, including obtaining external information, reading feedback signals, and parsing context. **Action** Ability to perform actions in the environment, such as calling tools, generating output, controlling interfaces, and modifying variables. In short, Agent = Perception + Action Continuously perform the "observe → decide → act" process in a loop to achieve task goals. > **Definition and Classification of Agents** **1. Coding Agent** Representative products: Cursor, Windsurf Features: Strong code generation and editing capabilities, excellent user experience Application scenarios: Code completion, code refactoring, multi-person collaborative programming **2. Search Agent** Features: Combined with search engines, automatically completes information retrieval and aggregation Application scenarios: Market research, report generation, competitor analysis, etc. Potential: Has strong application value in enterprise-level scenarios **3. Tool-Use Agent** Features: Able to call a variety of external tools to complete complex tasks Focus of application: It is the main direction of current Agent research and implementation Example: ReAct (Reasoning + Action) type Agent, which executes tasks through tool calling **4. Computer Use Agent** Representative products: OpenAI Operator, Claude's Computer Use Features: Simulates humans using computers to complete complex operations across applications Application scenarios: Execution process automation, remote assistant, office agent > **Comparison of Agent Technical Routes** **1. In-Context Learning** Features: Relies on powerful pre-trained models, and task planning and execution are achieved through prompt construction Advantages: No fine-tuning, high flexibility Limitations: Weak generalization ability, limited rollout length, easy to get out of control **2. End-to-End Training** Features: Encodes all Agent behaviors into model weights Advantages: Stable reasoning, strong controllability Limitations: High training cost, complex environment construction > **Key Aspects of Agent Training** **1. Data Synthesis** Method: Generate a large number of high-quality trajectories Purpose: Training Agent on how to make decisions, call tools, and manage memory in tasks **2. Reinforcement Learning** Conditions: Requires clearly defined tasks and verifiable rewards Challenges: Task difficulty and environmental feedback design directly affect the quality of Agent behavior **3. Safety Issues** Risks: Agent has autonomous decision-making ability and is prone to misuse tools and deviate from the track Countermeasures: Add sandbox restrictions, behavior constraint mechanisms, Human-in-the-loop > **Outlook: Maybe there will be a new paradigm** The core of generating data will shift from input-output data annotation to building an environment and corresponding task-reward. For example, Scale AI proposed rubrics as reward. Can Agent achieve self-improvement? On the one hand, Agent will continuously obtain new data in the process of interacting with the environment; can it find or construct verifiable rewards by itself? Can the experience accumulated in the interaction be used more effectively?
Original title: 110. 逐段讲解Kimi K2报告并对照ChatGPT Agent、Qwen3-Coder等:“系统工程的力量”
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p><strong>我们又来读论文啦!!!</strong></p><p>今天我们要读的论文是最近几个星期内最值得品读的几篇技术报告,分别是:<strong>Kimi K2、ChatGPT Agent、Qwen3-Coder的技术报告,以及Manus的一篇技术博文。</strong>他们的相关性是,这几篇内容都和Agent有关系。</p><p>今天的嘉宾是俄亥俄州立大学(The Ohio State University)的在读博士郑博元,他的研究方向是Language Agent,他会带我们一起读上述技术报告和博文。</p><p>这是《商业访谈录》的<strong>“技术之美”系列</strong>,期待和你一起读论文,领略科技平权,感受技术之美——做你的赛博组会:)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FgHNmAFclRglFbm9XogKflmG_D-w.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>00:02:00 给Agent下定义和分类</p><p>00:14:50 Kimi K2、ChatGPT Agent、Qwen3-Coder、Manus的技术路线对比</p><p>00:28:29 Agent Training 的关键环节:合成数据、强化学习、安全</p><p>00:30:57 <strong>第一篇技术报告:Kimi K2: Open Agentic Intelligence</strong></p><p><a href="https://github.com/MoonshotAI/Kimi-K2/blob/main/tech_report.pdf" rel="noopener noreferrer nofollow" target="_blank">github.com</a></p><p>00:43:50 <strong>第二篇技术报告和访谈:Introducing ChatGPT agent: bridging research and action</strong></p><p><a href="https://openai.com/zh-Hans-CN/index/introducing-chatgpt-agent/" rel="noopener noreferrer nofollow" target="_blank">openai.com</a></p><p><strong>红杉访谈OpenAI:OpenAI Just Released ChatGPT Agent, Its Most Powerful Agent Yet</strong></p><p><a href="https://www.sequoiacap.com/podcast/training-data-chatgpt-agent/" rel="noopener noreferrer nofollow" target="_blank">www.sequoiacap.com</a></p><p>01:53:38 <strong>第三篇技术报告:Qwen3-Coder: Agentic Coding in the World</strong></p><p><a href="https://qwenlm.github.io/blog/qwen3-coder/" rel="noopener noreferrer nofollow" target="_blank">qwenlm.github.io</a></p><p>01:59:04 <strong>第四篇技术博文:AI代理的上下文工程:构建Manus的经验教训(作者:Yichao 'Peak' Ji)</strong></p><p><a href="https://manus.im/zh-cn/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus" rel="noopener noreferrer nofollow" target="_blank">manus.im</a></p><p>02:06:06 展望:也许会有一个新的范式</p><p>02:15:20 我感觉Agent是“我拓展的大脑”,我背后有一个“军团”(Family of Agents)</p><p>02:16:41 不同Bot的语言风格:DeepSeek嘴臭,元宝舔狗</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><blockquote><p><strong>智能体定义</strong></p></blockquote><p>Agent是一种能够与环境进行交互(interaction)的智能系统。</p><p>它具备两个基本能力:</p><p><strong>感知能力(Perception)</strong><br />能够观察环境的状态,包括获取外部信息、读取反馈信号、解析上下文等。</p><p><strong>行动能力(Action)</strong><br />能够在环境中执行动作,例如调用工具、生成输出、控制界面、修改变量等。</p><p>简言之,Agent = 感知 + 行动<br />在一个循环中不断执行“观察 → 决策 → 行动”的流程,以达成任务目标。</p><blockquote><p><strong>Agent 的定义与分类</strong></p></blockquote><p><strong>1. Coding Agent(代码智能体)</strong><br />代表产品:Cursor、Windsurf<br />特点:代码生成与编辑能力强,用户体验优秀<br />应用场景:代码补全、代码重构、多人协作编程</p><p><strong>2. Search Agent(搜索型智能体)</strong><br />特点:结合搜索引擎,自动完成信息检索和汇总<br />应用场景:市场调研、报告生成、竞争对手分析等<br />潜力:在企业级场景中有很强的应用价值</p><p><strong>3. Tool-Use Agent(工具使用型智能体)</strong><br />特点:能够调用多种外部工具完成复杂任务<br />应用重点:是目前 Agent 研究和落地的主要方向<br />举例:ReAct(推理 + 行动)类 Agent,通过 tool calling 执行任务</p><p><strong>4. Computer Use Agent(电脑操作型智能体)</strong><br />代表产品:OpenAI Operator、Claude 的 Computer Use<br />特点:模拟人类使用电脑,完成跨应用的复杂操作<br />应用场景:执行流程自动化、远程助理、办公代理</p><blockquote><p><strong>Agent 的技术路线对比</strong></p></blockquote><p><strong>1. In-Context Learning(上下文学习)</strong><br />特点:依赖强大的预训练模型,通过提示构造实现任务规划与执行<br />优势:无需微调,灵活性高<br />局限:泛化能力弱,rollout 长度有限,容易失控</p><p><strong>2. End-to-End Training(端到端训练)</strong><br />特点:将 Agent 的全部行为编码进模型权重<br />优势:推理稳定,可控性强<br />局限:训练成本高,环境构建复杂</p><blockquote><p><strong>Agent Training 的关键环节</strong></p></blockquote><p><strong>1. Data Synthesis(数据合成)</strong><br />方法:生成大量高质量的 trajectory(行动轨迹)<br />用途:训练 Agent 在任务中如何决策、调用工具、管理 memory(记忆)</p><p><strong>2. Reinforcement Learning(强化学习)</strong><br />条件:需要定义清晰的 task(任务)与 verifiable reward(可验证奖励)<br />挑战:任务难度与环境反馈设计直接影响 Agent 的行为质量</p><p><strong>3. Safety(安全性)问题</strong><br />风险:Agent 具备自主决策能力,容易误用工具、走偏轨迹<br />对策:加入 sandbox(沙盒)限制、行为约束机制、Human-in-the-loop(人类监控)</p><blockquote><p><strong>展望:也许会有一个新的范式</strong></p></blockquote><p>生成数据的核心会从 input-output 式的数据标注,转向构建 environment(环境)以及对应的 task-reward(任务-奖励)。比如 Scale AI 提出的 rubrics as reward(用评分标准作为奖励机制)</p><p>Agent 能不能实现自我提升(self-improve)?一方面,Agent 在和环境交互的过程中会不断获得新数据;那它能不能自己找到或构造 verifiable reward(可验证的奖励)?交互中积累的 experience(经验),能不能被更有效地利用起来?</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
机器人遭遇数据荒?与谢晨聊:仿真与合成数据、Meta 天价收购和 Alexandr Wang。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-07-15 23:00
Today is another robot special. The guest is Xie Chen, founder and CEO of Kuanglun Intelligence, who previously served as the head of autonomous driving simulation at Nvidia, Cruise, and NIO. Our topic is very specific: simulation and synthetic data. Today's embodied intelligence has not yet found an effective formula for scaling law, and data is a key bottleneck. Wang He, the founder of Galaxy General and guest of our 106th episode, mentioned that real data accounts for only 1% of their training data, and synthetic data is the mainstay. In today's episode, I talked with Xie Chen about the practical details of simulation and synthetic data. 02:00 Quick Q&A; starts 02:48 High-frequency vocabulary analysis: Sim2Real (from simulation to reality), Sim2Real gap, synthetic data 04:31 From Cruise to Nvidia to NIO, how to do synthetic data and simulation? 14:11 What is the specific process of making synthetic data? What is the ratio of synthetic data to real data? 16:17 The difference between intelligent driving and embodied intelligence in synthetic data (intelligent driving is a visual game, and physical interaction is the most critical for embodied intelligence) 32:41 What is the physical Real2Sim (reality to simulation) workflow? How to evaluate a successful simulation? Key technical nodes? 46:18 Physical Intelligence (π) has a dilemma attitude towards simulation and synthetic data 48:55 Spicy comments on Meta's $30 billion acquisition of Scale AI and Alexandr Wang's extremely aggressive behavior 53:57 Current bottlenecks in synthetic data 55:25 Global embodied intelligence industry chain Mapping: Hardware company (Yushu) Base model company (π, Skild, Nvidia and DeepMind) Software and hardware integration companies landing in vertical fields (Figure, Tesla Optimas, The Bot Company) Companies that focus on simulation for end-to-end landing (Kuanglun) ("Tesla Optimas' management culture is completely different from π") 01:09:22 There are entrepreneurial opportunities in the embodied model layer in the United States. In my opinion, ByteDance, Xiaomi, and Ideal are more suitable for being the "brain" in China. 01:15:33 Lao Huang said internally: NV is a simulation company 01:21:25 The final model should be cross-universe, cross-world, and cross-ontology (improving the ability to cross the universe is essentially improving generalization) 01:23:28 The embodied intelligence industry is still in the GPT-1 stage and has not yet found a formula for scaling law. 01:28:21 I just started my business and started learning from embodied undergraduate studies. 01:37:37 Final quick Q&A; [Robot Special] Detailed explanation of robot base models and VLA classic paper - "Humans are the most intelligent VLA" Talking with Wang He about the academic marginal history of embodied intelligence and the artificial chaos after the capital bombing
Original title: 109. 机器人遭遇数据荒?与谢晨聊:仿真与合成数据、Meta天价收购和Alexandr Wang
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天又是一集机器人专场。嘉宾是光轮智能创始人兼CEO谢晨,他曾在英伟达、Cruise及蔚来汽车担任自动驾驶仿真负责人。我们的话题非常具体,即:仿真与合成数据。</p><p>今天的具身智能尚且没有找到scaling law的有效配方,其中,数据是一个关键卡点。我们106集的嘉宾银河通用创始人王鹤就提到,真实数据在他们的训练数据比重仅仅1%,合成数据挑起大梁。</p><p>今天这集节目,我与谢晨聊了聊仿真与合成数据的实操细节。</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FvO83aFZZwjNNrtt9QcAh3xGuJzF.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>02:00 开始的快问快答</p><p>02:48 高频词汇解析:Sim2Real(从仿真到现实)、Sim2Real的gap、合成数据</p><p>04:31 从Cruise到英伟达到蔚来,怎么做合成数据和仿真?</p><p>14:11 制作合成数据的具体流程?合成数据与真实数据的配比?</p><p>16:17 在合成数据上,智能驾驶和具身智能的区别(智能驾驶是视觉的游戏,具身智能的物理交互最关键)</p><p>32:41 物理的Real2Sim(真实到仿真)工作流是怎样的?怎么评估成功的仿真?关键技术节点?</p><p>46:18 Physical Intelligence(π)对仿真与合成数据的两难态度</p><p>48:55 辣评Meta 300亿美金收购Scale AI和极其aggressive的Alexandr Wang</p><p>53:57 合成数据目前面临的瓶颈</p><p>55:25 全球具身智能产业链Mapping:</p><p>硬件公司(宇树)</p><p>基座模型公司(π、Skild、英伟达和DeepMind)</p><p>在垂域落地的软硬结合公司(Figure,特斯拉Optimas、The Bot Company)</p><p>以仿真为中心做端到端落地的公司(光轮)</p><p>(“特斯拉Optimas的管理文化和π完全不一样”)</p><p>01:09:22 美国存在具身模型层的创业机会,中国在我看来字节、小米、理想更适合做“大脑”</p><p>01:15:33 老黄在内部说:NV is a simulation company</p><p>01:21:25 终局的模型应该是是跨宇宙、跨世界、跨本体(提升跨宇宙的能力,本质是提升泛化性)</p><p>01:23:28 具身智能的产业还在GPT-1阶段,还没找到scaling law的配方</p><p>01:28:21 我创业刚开始,从具身的本科开始学起</p><p>01:37:37 最后的快问快答</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【机器人专场】</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67f28c6e0decaeb0943fb14a" rel="noopener noreferrer nofollow" target="_blank">逐篇讲解机器人基座模型和VLA经典论文——“人就是最智能的VLA”</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/6857f2174abe6e29cb65d76e" rel="noopener noreferrer nofollow" target="_blank">和王鹤聊,具身智能的学术边缘史和资本轰炸后的人为乱象</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
Yu Kai's 30-year oral history: the world is more than just swords and shadows, it's a bustling story of people coming and going.From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-07-07 23:00
Today's guest is Dr. Yu Kai, founder and CEO of Horizon Robotics. In the past 49 years of his life, **he has navigated the academic circles of Germany and the United States, the Chinese internet circle, the venture capital circle, the capital circle, and the automotive circle.** In each circle and "jianghu" (江湖, a term for the martial arts world or a specific community), he started as an unknown nobody and leveled up. In the end, he did well in every circle. A former senior executive who has dealt with him commented, **Yu Kai is one of the most socially intelligent scientists.** Yu Kai graduated from Nanjing University and the University of Munich. After graduation, he worked for Siemens and NEC Research Institute. In 2012, he returned to China to join Baidu, and in 2015, he left to found Horizon Robotics. Coincidentally, 2025 marks the 10th anniversary of Horizon Robotics' founding. **This year, I spoke with Dr. Yu Kai twice, and this episode is his oral history.** With the outbreak of the large language model wave, more artificial intelligence scientists are pouring into the entrepreneurial track from universities. Yu Kai's entrepreneurial philosophy may give everyone some inspiration - **entrepreneurship is not only about technology and business, nor is it just about conflict and competition, but also a "jianghu" story of people coming and going.** Just like Zhang Zuolin's line in the TV series "Young Marshal": "Jianghu is not about fighting, it's about social skills." In 2025, we will progress together with AI! 03:06 **Entering the Academic Jianghu** Initially unknown in academic circles, a fortune teller said I would be "unknown and work in vain" before the age of 24. I have published over 100 papers. I am very intoxicated, and I would appreciate my previous papers late at night. Stories of meeting Geoffrey Hinton, Yann LeCun, and Andrew Ng. There was a very silent person sitting opposite me, and no one paid attention to him. He was eating alone - this person is Richard Sutton, who recently won the Turing Award. 31:18 **Entering the Internet Jianghu Again** I should be the first Chinese AI scholar who studied in the United States to return to China. I immediately wrote to Geoffrey Hinton, and he replied: Kai, that's great, but would you mind if I also asked other companies? The authorization I received at the time was to bid up to $24 million. After $24 million, I had to discuss each bid with the domestic side. In order to win with a small probability, I took the lead and offered $12 million. "Hey, you see Geoffrey Hinton doesn't seem to show up at meetings often, what's he doing...?" I asked him: Hey, Andrew (Andrew Ng), what are you doing? How is everything going? I started to test him. Andrew Ng was shocked! He said: You tricked me into Baidu, and you ran away yourself, that's not cool! 51:19 **Entering the Entrepreneurial Jianghu Again** I made three investments: I bought Nvidia, I bought Tesla, and I wholeheartedly invested in Horizon Robotics. This guy told me: Brother, you know what? My status at home now depends on that sentence of yours! When Horizon Robotics was just founded, I looked at it, and Nvidia was only a $10.7 billion company, now it's 3 trillion! What is one frustration that Andrew Ng had while leading Google Brain at Google? Not being able to buy GPUs! Consensus is either wrong or worthless. What is your business secret? What is something you saw that others didn't see? Is there a bug in this world? Is there a narrow door to the future that most people haven't paid attention to? 01:11:21 **Entering the Capital Jianghu Too** We didn't write a single page of BP (business plan) and raised the first round. I thought: Wow, life is so easy! As a result, in the second round, I discovered that I met with 50-60 institutions, and none of them placed an order. It was particularly tough... no one understood... What I said was almost dry... for a long time... in the dark... and no one was moved. I set an iron rule: the first time I meet with an investor, it must not be in his office, but in my office. I kept pretending! I said: I really don't have time, I am just a focused, low-EQ scientist, tinkering with my own things, and I'm too lazy to deal with you. We created the legendary 12 mini-rounds of Series C, taking 1.6 billion US dollars in one go - this is also a counter-consensus - without adding 1 cent of valuation in the middle. Wow, Horizon Robotics actually has 102 shareholder investment institutions, I don't even know how I managed to get them. 01:21:39 **Switching to the Automotive Jianghu** Scientists often have this problem when starting a business: 360-degree scanning. After Zeng Ming's class, many of our classmates went back and cut directions and laid off teams. One night while sleeping, I suddenly woke up in shock: Damn, this is not right! With Changan: deliberately losing the game, you have to lose elegantly, discreetly, and deliberately. With Li Xiang: Li Xiang told me when we climbed the mountain in early 2019: You should focus on the automotive direction. With He Xiaopeng: I haven't conquered Xiaopeng yet - sometimes you have to attack head-on, sometimes you have to go around. With Wang Chuanfu: We seized the opportunity window, which is equivalent to opening a small crack in the door, and we rushed in with a "whoosh." 02:09:48 **I am not a Jianghu Person** My role model for leaders is Liu Bang. Do you know who my favorite character is in these movies? Rhett Butler in "Gone with the Wind." My surname is Yu, and the company's name is Horizon - leeway, leeway, always leave leeway for people and things. Intelligent driving: OEMs (original equipment manufacturers) will not self-develop in the future, it is a standardized function. 3 years to complete 100% hands-off, 5 years to complete 100% eyes-off, 10 years to complete 100% minds-off. What is the death door? The CUDA of robots. Next-generation chip innovation. 02:35:23 **Final Quick Q&A** I think the world is a pre-written program, and everyone is acting according to the script. 02:39:26 **Additional Extras** Teaching skills: If you are determined to resign, don't say anything bad about the company. Yan Junjie's hairstyle is like mine (joke). Andrew Ng and I seriously discussed entrepreneurship in the United States. I drank Maotai with a college graduate to adjust him, unlike Li Xiang who is decisive. Why is the WeChat profile picture Guan Yu? [From Steam Engine to Self-Driving] Series 《3-Hour Interview with Li Xiang (Podcast Version): Otaku, AI, Family, Games and Ladder》 《Chatting with He Xiaopeng about FSD, "Swimming in a Sea of Blood," Heroes and Dogs in Troubled Times》 《Dialogue with Ola Källenius, Global CEO of Mercedes-Benz: CEO in Transition and 139-Year-Old Mercedes-Benz in Transition》 《Chatting with Lou Tiancheng about Robotaxi and ACRush: "The better L2 is done, the further away L4 is"》 Text version of this episode: 《Dialogue with Yu Kai: The World is More Than Swords and Shadows, It's a Jianghu Story of People Coming and Going》
Original title: 108. 余凯口述30年史:世界不止刀光剑影,是一部人来人往的江湖故事
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是地平线创始人兼CEO余凯博士。</p><p>在过去49年人生中,<strong>他一路闯关过德美学术圈、中国互联网圈、创投圈、资本圈、汽车圈。</strong>在每个圈子和江湖,都从籍籍无名的无名小卒开始升级打怪。到最后,在每个圈子,他混得都不错。</p><p>一位与他打过交道的前企业高层评价,<strong>余凯是科学家里非常具有社会智慧的一位。</strong></p><p>余凯毕业于南京大学和慕尼黑大学,毕业后,先后就职西门子、NEC研究院,于2012年回国加入百度,又于2015年离职创立地平线。</p><p>很巧的是,2025年正好是地平线创立10年。<strong>今年上半年,我与余凯博士聊了两次,这集节目是他的一部口述史</strong>。</p><p>随着大语言模型浪潮爆发,更多人工智能科学家从高校系统涌入创业轨道。余凯的创业观,也许能给大家一些启示——<strong>创业不仅是技术和商业,也不仅仅有刀光剑影,更是一部人来人往的江湖故事。</strong></p><p>就像电视剧《少帅》张作霖的台词:“江湖不是打打杀杀,江湖是人情世故。”</p><p>2025年,我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FtYwXS7GiDXB0ddOvtgKa35Ak6qg.png" /></figure><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote>03:06 <strong>初入学术江湖</strong></blockquote><p>一开始在学术圈籍籍无名,算命先生说我24岁前“籍籍无名,劳而无功”</p><p>发表过100篇论文,我很陶醉,夜深人静都会翻我以前的paper自我欣赏</p><p>结识Geoffrey Hinton、Yann LeCun、吴恩达的故事</p><p>我这边的对面坐了一个人特别的沉默,没人搭理他,一个人在那吃闷饭——这个人叫Richard Sutton,前段时间拿了图灵奖</p><blockquote>31:18 <strong>再入互联网江湖</strong></blockquote><p>我应该是旅美人工智能华人学者第一个回国的</p><p>我立刻就跟Geoffrey Hinton写信,他回信:Kai,挺好的,但你介不介意我也问一下其他公司?</p><p>我当时拿到的授权是,最高出到2400万美金,2400美金以后,每一次出价就要跟国内商量</p><p>我为了小概率能赢,抢先第一个出价,1200万美金</p><p>“哎呀,你看Geoffrey Hinton开会好像不太出现啊,他在干嘛…?”</p><p>我就问他:唉,Andrew(吴恩达)你在干嘛?各方面怎么样?开始试探他</p><p>吴恩达一下子震惊到了!说:你小子把我忽悠到百度,你自己跑掉,太不够意思了吧?</p><blockquote>51:19 <strong>又入创业江湖</strong></blockquote><p>我做了3个投资:买了英伟达,买了特斯拉,全身心把我投到地平线</p><p>这个哥们跟我讲:兄弟,你知道吗?我现在在我家的地位,就靠你那句话!</p><p>地平线刚创立那一天我看了一下,英伟达才是一个107亿美金公司,现在是3万亿!</p><p>吴恩达在Google lead谷歌大脑,有一个frustration(沮丧)是什么?不能买GPU!</p><p>共识要么是错的,要么是没价值的</p><p>你的商业的secret是什么?有什么东西你看见了别人没有看见?这个世界是不是有Bug?这个世界是不是有通向未来的窄门,而大部分人没有关注到?</p><blockquote>01:11:21 <strong>也入资本江湖</strong></blockquote><p>我们一页BP没写,就融了第一轮,我觉得:哎呀,Life is so easy!</p><p>结果第二轮就发现,见了50-60家机构,没一个下单。特别tough……没人理解……</p><p>我说的简直是口干舌燥……地老天荒……昏天黑地……也没人动心</p><p>我定了一个铁律:我跟投资人第一次见面,绝不能在他办公室,一定要在我办公室</p><p>我继续装!我说:我真的没时间,我就是一个专注的、情商低的科学家,正在倒腾我自己的事情,懒得理你</p><p>我们创造了C轮业界传奇的12小轮,一把拿了16亿美金——这也是一个反共识——中间没有加1分钱估值</p><p>哇,地平线竟然有102家股东投资机构,我都不知道我怎么磕出来的</p><blockquote>01:21:39<strong> 转战汽车江湖</strong></blockquote><p>科学家创业通常有这个问题:360度扫射</p><p>曾鸣那堂课上完以后,我们班好多同学回去都去砍方向、裁团队</p><p>有天晚上睡觉,我梦中突然一惊:我靠,这样不对啊!</p><p>和长安:故意输球,你们要优雅地、不露声色地、故意地输啊</p><p>和李想:李想在2019年初,我们俩爬山他讲:你应该聚焦汽车方向</p><p>和何小鹏:我现在还没有磕下小鹏————有的时候你要强攻,有的时候你要迂回</p><p>和王传福:我们逮着机会窗口,相当于这个门开一个小缝,咱们就呲溜一声冲进去</p><blockquote>02:09:48<strong> 我不是江湖人</strong></blockquote><p>领导者我的role model是刘邦</p><p>电影这些角色,你知道我最喜欢谁吗?《飘》里的白瑞德</p><p>我的名字姓余,公司的名字地平线——余地,余地,做人做事永远要留有余地</p><p>智能驾驶:主机厂未来不会自研,它是一个标准化的功能</p><p>3年完成100%hands-off,5年完成100%eyes-off,10年完成100%minds-off</p><p>死门是什么?</p><p>机器人的CUDA</p><p>下一代芯片创新</p><blockquote>02:35:23<strong> 最后的快问快答</strong></blockquote><p>这个世界我认为是写好了程序,每个人都是按照剧本来演</p><blockquote>02:39:26 <strong>补充花絮</strong></blockquote><p>传授技巧:如果你决心离职,不要说公司任何不好</p><p>闫俊杰的发型像我(玩笑)</p><p>我和吴恩达在美国serious讨论过创业</p><p>我为了调一个校招生喝茅台,不像李想手起刀落</p><p>微信头像为什么关公?</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【从蒸汽机到无人驾驶】系列</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67769bd815a5fd520e8fa318">《对李想的3小时访谈(播客版):宅男、AI、家庭、游戏和天梯》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/6695032837236c546e4c2e0f">《和何小鹏聊,FSD、“在血海游泳”、乱世中的英雄与狗熊》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68300e93fcbc2e206b58eb2b">《对话奔驰全球CEO康林松:转型期CEO和转型之中的139岁奔驰》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/66bdb98233591c27be49e931">《和楼天城聊聊Robotaxi和ACRush:“L2做得越厉害,离L4越远”》</a></p><p>本集文字版:<a href="https://mp.weixin.qq.com/s/TQ-pOkEi412kYvrJE2F1TQ">《对话余凯:世界不止刀光剑影,是一部人来人往的江湖故事》</a></p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
与梦秋的叙旧:创投挺无聊,也聊聊旅行、读书和女性主义。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-06-29 23:00
This episode doesn't have grand narratives; it's quite casual. The guest is Meng Qiu, founding partner of Qingliu Capital and former Technology VP of Baidu. Longtime listeners of "Business Interview" may know that Meng Qiu basically returns every year to catch up on the current venture capital climate and her own life. In the wolf-culture-dominated Chinese investor circle, Meng Qiu has always been very Buddhist and Taoist. This episode is even more relaxed. She frankly says that work is quite boring, so after discussing serious topics, we also talked about reading, travel, movies, and girls' chattering. (This episode was recorded at the end of April) Our podcast program is first launched on Tencent News. You can go and follow us to get the program information and more news as soon as possible :) 02:00 Has the capital winter of 2025 passed? No… 04:00 Has the emergence of DeepSeek made AI application entrepreneurship more active? No… 10:45 Current experience of various Bots: fortune teller? Simp? Especially commenting on WeChat, Yuanbao, and Xiaohongshu 25:28 Discuss how to make Agent in WeChat? Is a universal Agent established? 31:25 Entrepreneurial opportunities and entrepreneurs for vertical Agents 35:52 Current organizations tend to be small, which may benefit young entrepreneurs 37:42 Why are organizations smaller, but the amount of financing is higher? 38:18 In addition to Agent, what I am also looking at is embodied intelligence (simulators are very important) 43:57 Wearable devices 54:54 Large model companies 58:31 Work has been very boring in the past two years, my travel journey 01:03:55 My reading journey 01:12:34 Talking about the film and television industry (Meng Qiu is an independent director of China Film), "Good Things" and feminism Meng Qiu's previous programs: "1. Chatting with investor Meng Qiu about California, the investment cold wave, and Lin Daiyu" "21. Large models and the real market temperature from the perspective of investors|Chatting with Meng Qiu about ChatGPT" "65. Has the key to venture capital failed? Chatting with Meng Qiu: Hibernation, a game with fewer people, and rodents" 【More Information】 Contact us: Weibo @张小珺-Benita For more information, please follow the official account: 张小珺
Original title: 107. 和梦秋的catch-up:创投挺无聊,也聊聊旅行读书和女性主义
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>这集没有宏大叙事,相当随性。嘉宾是清流资本创始合伙人、百度前技术VP梦秋。</p><p>关注《商业访谈录》比较久的朋友可能知道,梦秋基本每年都会来返场一次,和我们一起catch-up当下的创投水温以及她自己的生活。在狼性文化蓬勃的中国投资人圈里,梦秋一直是很佛系也很道家的存在。</p><p>这一集更是松弛,她直言工作挺无聊,所以在聊了正经话题以后,我们也聊了聊读书、旅行、观影和女生的碎碎念。</p><p>(本次节目录制在4月底)</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FgO-fNdZ9-g2JgGptgIspHPz5Me5.png" /></figure><blockquote><p>我们的播客节目在<a href="https://view.inews.qq.com/u/8QIf3n5c64Ucuzne7gI%3D?devid=FF4E49E6-9C89-4986-A413-04E856F31262&qimei=766696f2cd8f313d744bc2c9000012918102&uid=100161026780" rel="noopener noreferrer nofollow" target="_blank">腾讯新闻首发</a>,大家可以前往关注哦,这样可以第一时间获取节目信息和更多新闻资讯:)</p></blockquote><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>02:00 2025年资本寒冬过去了吗?没…</p><p>04:00 DeepSeek的出现,让AI应用创业变得活跃了吗?没…</p><p>10:45 现阶段各种Bot的体验:神婆?舔狗?尤其点评微信、元宝和小红书</p><p>25:28 探讨一下,微信里怎么做Agent?通用Agent成立吗?</p><p>31:25 垂直Agent的创业机会和创业者</p><p>35:52 现在的组织倾向于小组织,这可能利好年轻创业者</p><p>37:42 为啥组织更小,融资额却更高了?</p><p>38:18 除了Agent,还在看的是具身智能(仿真器很重要)</p><p>43:57 可穿戴设备</p><p>54:54 大模型公司</p><p>58:31 这两年工作很boring,我的旅行之路</p><p>01:03:55 我的读书之路</p><p>01:12:34 聊影视行业(梦秋是中影独董)、《好东西》和女性主义</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>梦秋此前的节目:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/62cc4eb0fa15142e17251617" rel="noopener noreferrer nofollow" target="_blank">《1. 和投资人梦秋聊聊加州、投资寒潮和林黛玉》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/6430e7c89361a4e7c3f586f0" rel="noopener noreferrer nofollow" target="_blank">《21. 投资人视角下的大模型和市场真实水温|和梦秋聊ChatGPT》</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/66261c27200abebe6e7635af" rel="noopener noreferrer nofollow" target="_blank">《65. 风险投资的钥匙失灵了吗?和梦秋聊:蛰伏、更少人的游戏和啮齿动物》</a></p><p>【更多信息】</p><p>联络我们:微博<a href="https://weibo.com/u/6486678714" rel="noopener noreferrer nofollow" target="_blank">@张小珺-Benita</a></p><p>更多信息欢迎关注公众号:张小珺</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
106. Talking with Wang He about the academic fringe history of embodied intelligence and the artificial chaos after capital bombardment.From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-06-22 23:00
今天继续《商业访谈录》的机器人专场,嘉宾是北京大学助理教授、银河通用创始人兼CTO王鹤。 王鹤毕业于清华和斯坦福大学。**他给我们从“具身智能”的学术缘起开始聊起,这是一个学术流派从一个学科中萌芽到边缘再到主流渗透的全过程。** 而随着ChatGPT诞生,“具身智能”这个小众概念,在过去2年成了新的资本宠儿——**但一时间,也带来了新的乱象。** 我们探讨了一些具身智能产业界关键问题: 1/具身智能起源于计算机视觉的学术流派,视觉、语言、智能的关系是什么?为什么VLM(视觉语言模型)的表现显著弱于LLM(大语言模型)? 2/具身智能的最大困境之一是数据采集,合成数据是正解吗?具体应该怎么做? 3/如果大模型提倡的是“智能即产品”,那么具身智能呢?王鹤的回答是“生产力即产品”。 去年底,英伟达创始人黄仁勋来华访问。答谢宴上,王鹤不仅和黄仁勋同桌,而且就在做黄仁勋旁边(挨着坐)。在节目最后,我们也聊了聊这个有趣的插曲——他提到,**那晚黄仁勋吃了不少水煮肉片。** 2025,我们和AI共同进步! 我们的播客节目在腾讯新闻首发,大家可以前往关注哦,这样可以第一时间获取节目信息和更多新闻资讯:) 03:00 开始的自问自答 **05:58 语言不是智能的本质,而是“一次跃变”** “具身智能”和“机器人”是不同学术流派 **“具身智能”起源于“计算机视觉”的研究流派** 视觉有智能吗?纯视觉智能的可解释性差,是端到端的 语言不是智能的本质,不能说没有语言就没有智能 **智能的本质是什么?“一种视情况对环境做出反应的能力”** 语言是人类能产生这么高智能的“一次跃变” 视觉的本质是一种非常强的sensor(传感器) **25:08 具身智能的学术边缘史** 具身智能最早兴起的task(任务)是,导航 **加入视觉模态,强调Perception–Action Loop(感知-动作循环),成为具身智能研究流派能立起来的核心叙事** 标志性事件:“具身智能是计算机视觉未来的三颗北极星之一”(李飞飞) 我和Skild创始人Deepak Pathak在Facebook人工智能实验室FAIR打过交道 **41:15 我的学术之路** 2016年,博士第一个项目:从人类视频里学多步的人与物体交互过程的生成(动画领域) 在Stanford博士第一年,在不喜欢的方向非常挣扎,后来换组、换方向 **Stanford是高度自由的市场:你可以随时踢你老板,你老板可以随时踢你** 第一篇论文憋了很久,很绝望 完全从视频中学习,学习世界模型,还没成为当下能推进具身智能的技术 我的第二个项目:位姿估计和合成数据相关 2020年李开复曾在湾区丽思卡尔顿组织brunch,观点分歧 回国坚定以家庭机器人为目标推进research,根本没有allies(盟军) **01:25:08 具身智能的软件和硬件是螺旋上升的问题** ChatGPT火了以后,很多人开始找我创业,我说创不了 所有工业机械臂在去年的全球总产值才1000亿RMB,和理想一家车企产值相当 如果采取不成熟的激进的硬件方案,对智能会是一种拖累 在这个硬件基础上,我们的方案是,做相对专用的智能和越来越通用的智能 **VLM为什么显著弱于LLM?**互联网视觉数据/所有人眼观测的覆盖〈〈〈互联网文字数据/人类所有说的话的覆盖(VLM数据不够,VLA的Action数据是最近两年才开始收集的) **01:44:34 我们要避免陷入以下泥潭** 这一代具身智能公司相比此前机器人公司,差异在哪? 在我看来,具身智能公司如果陷入以下两个泥潭,天花板会很有限: 1、“长期漂浮”的公司;2、“算不过来账”的公司,边际成本不降 我们要做一个应用场景内的泛化(现在选择的是货架场景) 在我看来,机器人领域的头部效应很重 **01:55:17 具身智能是,“生产力即产品”** 雇人摇操采真实数据的成本到底有多高?一笔经济账 真实数据在我们训练数据的比重是1%,合成数据管线挑起大梁 行业内的tricky现象:把没有功能的机器人卖给别人(这是一种商业模式) 关于合成数据和Sim-to-Real(仿真到现实迁移)的常见误区 有出货量后的数据回流和数据飞轮 **如果大模型是“智能即产品”,那么具身智能就是“生产力即产品”** **02:13:51 资本轰炸后的人为乱象** 谁在创造生产力,谁在讲故事,这是最乱的——这个源自美国 对Figure的估值400亿美元的两种逻辑 有的人胆子很大,不告诉别人我是摇操,但实际摇操 **呼吁:真实展示!不要摇操!** 5年内我们一定要有万台以上的应用,如果做不到这个,我们这个领域就被证伪了! 不要去搞一些砸我们行业招牌的事情!这些模式是很可怕的,是在砸这个行业的饭碗 通用机器人的到来不要想得那么快 **02:25:25 一个插曲** **去年黄仁勋访华为什么和黄仁勋同桌且在旁边?聊了什么?** 黄仁勋能吃辣,吃了很多水煮肉片 02:28:26 最后的快问快答 【机器人专场】 逐篇讲解机器人基座模型和VLA经典论文——“人就是最智能的VLA” 【更多信息】 联络我们:微博@张小珺-Benita 更多信息欢迎关注公众号:张小珺
Original title: 106. 和王鹤聊,具身智能的学术边缘史和资本轰炸后的人为乱象
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天继续《商业访谈录》的机器人专场,嘉宾是北京大学助理教授、银河通用创始人兼CTO王鹤。</p><p>王鹤毕业于清华和斯坦福大学。<strong>他给我们从“具身智能”的学术缘起开始聊起,这是一个学术流派从一个学科中萌芽到边缘再到主流渗透的全过程。</strong></p><p>而随着ChatGPT诞生,“具身智能”这个小众概念,在过去2年成了新的资本宠儿——<strong>但一时间,也带来了新的乱象。</strong></p><p>我们探讨了一些具身智能产业界关键问题:</p><p>1/具身智能起源于计算机视觉的学术流派,视觉、语言、智能的关系是什么?为什么VLM(视觉语言模型)的表现显著弱于LLM(大语言模型)?</p><p>2/具身智能的最大困境之一是数据采集,合成数据是正解吗?具体应该怎么做?</p><p>3/如果大模型提倡的是“智能即产品”,那么具身智能呢?王鹤的回答是“生产力即产品”。</p><p>去年底,英伟达创始人黄仁勋来华访问。答谢宴上,王鹤不仅和黄仁勋同桌,而且就在做黄仁勋旁边(挨着坐)。在节目最后,我们也聊了聊这个有趣的插曲——他提到,<strong>那晚黄仁勋吃了不少水煮肉片。</strong></p><p>2025,我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fuo-Y2NiU-ETwS5ypbWWUtvAVBaQ.png" /></figure><blockquote>我们的播客节目在<a href="https://view.inews.qq.com/u/8QIf3n5c64Ucuzne7gI%3D?devid=FF4E49E6-9C89-4986-A413-04E856F31262&qimei=766696f2cd8f313d744bc2c9000012918102&uid=100161026780">腾讯新闻首发</a>,大家可以前往关注哦,这样可以第一时间获取节目信息和更多新闻资讯:)</blockquote><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>03:00 开始的自问自答</p><blockquote><strong>05:58 语言不是智能的本质,而是“一次跃变”</strong></blockquote><p>“具身智能”和“机器人”是不同学术流派</p><p><strong>“具身智能”起源于“计算机视觉”的研究流派</strong></p><p>视觉有智能吗?纯视觉智能的可解释性差,是端到端的</p><p>语言不是智能的本质,不能说没有语言就没有智能</p><p><strong>智能的本质是什么?“一种视情况对环境做出反应的能力”</strong></p><p>语言是人类能产生这么高智能的“一次跃变”</p><p>视觉的本质是一种非常强的sensor(传感器)</p><blockquote><strong>25:08 具身智能的学术边缘史</strong></blockquote><p>具身智能最早兴起的task(任务)是,导航</p><p><strong>加入视觉模态,强调Perception–Action Loop(感知-动作循环),成为具身智能研究流派能立起来的核心叙事</strong></p><p>标志性事件:“具身智能是计算机视觉未来的三颗北极星之一”(李飞飞)</p><p>我和Skild创始人Deepak Pathak在Facebook人工智能实验室FAIR打过交道</p><blockquote><strong>41:15 我的学术之路</strong></blockquote><p>2016年,博士第一个项目:从人类视频里学多步的人与物体交互过程的生成(动画领域)</p><p>在Stanford博士第一年,在不喜欢的方向非常挣扎,后来换组、换方向</p><p><strong>Stanford是高度自由的市场:你可以随时踢你老板,你老板可以随时踢你</strong></p><p>第一篇论文憋了很久,很绝望</p><p>完全从视频中学习,学习世界模型,还没成为当下能推进具身智能的技术</p><p>我的第二个项目:位姿估计和合成数据相关</p><p>2020年李开复曾在湾区丽思卡尔顿组织brunch,观点分歧</p><p>回国坚定以家庭机器人为目标推进research,根本没有allies(盟军)</p><blockquote><strong>01:25:08 具身智能的软件和硬件是螺旋上升的问题</strong></blockquote><p>ChatGPT火了以后,很多人开始找我创业,我说创不了</p><p>所有工业机械臂在去年的全球总产值才1000亿RMB,和理想一家车企产值相当</p><p>如果采取不成熟的激进的硬件方案,对智能会是一种拖累</p><p>在这个硬件基础上,我们的方案是,做相对专用的智能和越来越通用的智能</p><p><strong>VLM为什么显著弱于LLM?</strong>互联网视觉数据/所有人眼观测的覆盖〈〈〈互联网文字数据/人类所有说的话的覆盖(VLM数据不够,VLA的Action数据是最近两年才开始收集的)</p><blockquote><strong>01:44:34 我们要避免陷入以下泥潭</strong></blockquote><p>这一代具身智能公司相比此前机器人公司,差异在哪?</p><p>在我看来,具身智能公司如果陷入以下两个泥潭,天花板会很有限:</p><p>1、“长期漂浮”的公司;2、“算不过来账”的公司,边际成本不降</p><p>我们要做一个应用场景内的泛化(现在选择的是货架场景)</p><p>在我看来,机器人领域的头部效应很重</p><blockquote><strong>01:55:17 具身智能是,“生产力即产品”</strong></blockquote><p>雇人摇操采真实数据的成本到底有多高?一笔经济账</p><p>真实数据在我们训练数据的比重是1%,合成数据管线挑起大梁</p><p>行业内的tricky现象:把没有功能的机器人卖给别人(这是一种商业模式)</p><p>关于合成数据和Sim-to-Real(仿真到现实迁移)的常见误区</p><p>有出货量后的数据回流和数据飞轮</p><p><strong>如果大模型是“智能即产品”,那么具身智能就是“生产力即产品”</strong></p><blockquote><strong>02:13:51 资本轰炸后的人为乱象</strong></blockquote><p>谁在创造生产力,谁在讲故事,这是最乱的——这个源自美国</p><p>对Figure的估值400亿美元的两种逻辑</p><p>有的人胆子很大,不告诉别人我是摇操,但实际摇操</p><p><strong>呼吁:真实展示!不要摇操!</strong></p><p>5年内我们一定要有万台以上的应用,如果做不到这个,我们这个领域就被证伪了!</p><p>不要去搞一些砸我们行业招牌的事情!这些模式是很可怕的,是在砸这个行业的饭碗</p><p>通用机器人的到来不要想得那么快</p><blockquote><strong>02:25:25 一个插曲</strong></blockquote><p><strong>去年黄仁勋访华为什么和黄仁勋同桌且在旁边?聊了什么?</strong></p><p>黄仁勋能吃辣,吃了很多水煮肉片</p><p>02:28:26 最后的快问快答</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【机器人专场】</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67f28c6e0decaeb0943fb14a">逐篇讲解机器人基座模型和VLA经典论文——“人就是最智能的VLA”</a></p><p>【更多信息】</p><p>联络我们:微博<a href="https://weibo.com/u/6486678714">@张小珺-Benita</a></p><p>更多信息欢迎关注公众号:张小珺</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
105. 与奔驰王忻聊,产业大转折下的德国汽车、话语权和技术之战From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-06-19 23:00
《商业访谈录》 has interviewed many CEOs and senior executives of Chinese new energy vehicle companies. Today's guest is Wang Xin, head of autonomous driving and connected car R&D at Mercedes-Benz China, a century-old German car company. We talked about the 20-year transformation of China's auto industry, and the transformation and secret stories of a German car company. Our podcast is first released on Tencent News, so you can follow it to get the latest program information and more news :) **Major Industrial Transformation** 01:25 20 years ago, even Bird mobile phones made cars 07:54 I used to work at automotive Tier 1 Delphi for 18 years, and joined Mercedes-Benz 3 years ago, which is behind a major industrial transformation 09:30 Several technology cycles in the global auto industry in the past 20 years (before 2004, 2004-2014, 2014-2020, 2020-present) 11:31 Now it has transformed into a data-driven era, and the era of Tier 1 black box delivery is over **Right to Speak** 27:40 Is the right to speak of the Chinese team and the German headquarters fought for? 28:27 Mercedes-Benz China R&D team organizational structure, communication mechanism and battle 34:08 The battle culture of German companies is different from that of American companies 41:23 What processes are required when features designed and produced for China are exported globally? **New Technology** 43:21 Intelligentization is an irreversible trend, but it should not be radical 46:50 Vehicle-to-vehicle communication needs to be redefined after L3 is achieved 51:54 The relationship between technology and luxury: if intelligence is equalized, has the standard of luxury changed? 01:01:49 The process of switching from rule-based algorithms to end-to-end last year was quite painful 01:04:40 LiDAR is a good redundancy 01:05:35 CLA and Baobao cooperate with large language models **139-year-old Car Company** 01:09:36 People-oriented 01:11:08 Safety steps 01:13:08 The world's first car driver was the wife of the founder of Mercedes-Benz 01:15:00 What is it like to work in a century-old company - what is glory? What is the burden? 01:17:48 Once-in-a-century transformation and major changes 01:33:22 Does Mercedes-Benz CEO Ola Källenius get angry? Related episodes: Dialogue with Mercedes-Benz Global CEO Ola Källenius: CEO in Transition and 139-year-old Mercedes-Benz in Transition 【More Information】 Contact us: Weibo @张小珺-Benita For more information, please follow the official account: 张小珺
Original title: 105. 和奔驰王忻聊,产业大转折下的德国汽车、话语权和技术battle
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>《商业访谈录》访谈过很多中国新能源车企的CEO和高层,今天的嘉宾来自一家德国百年车企,他是奔驰中国自动驾驶与车联网研发负责人王忻。</p><p>我们聊了聊中国汽车产业20年变革的历程,以及一家德国车企的转型与秘密故事。</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FjMoypYgmSWwR6Hv1JpFkZRvS8Yu.png" /></figure><blockquote><p>我们的播客节目在<a href="https://view.inews.qq.com/u/8QIf3n5c64Ucuzne7gI%3D?devid=FF4E49E6-9C89-4986-A413-04E856F31262&qimei=766696f2cd8f313d744bc2c9000012918102&uid=100161026780" rel="noopener noreferrer nofollow" target="_blank">腾讯新闻首发</a>,大家可以前往关注哦,这样可以第一时间获取节目信息和更多新闻资讯:)</p></blockquote><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><p><strong>产业大转型</strong></p></blockquote><p>01:25 20年前,就连波导手机也做过汽车</p><p>07:54 我曾在汽车Tier 1德尔福18年,3年前加入奔驰,背后是产业大转折</p><p>09:30 过去20年全球汽车产业的几个技术周期(2004年以前,2004-2014年,2014-2020年,2020年至今)</p><p>11:31 现在转变成数据驱动的时代,Tier 1黑盒交付的时代不再</p><blockquote><p><strong>话语权</strong></p></blockquote><p>27:40 中国团队和德国总部的话语权是争夺过来的吗?</p><p>28:27 奔驰中国研发团队组织架构、沟通机制和battle</p><p>34:08 德国企业的battle文化和美国企业是不同的</p><p>41:23 为中国设计生产的功能要反向输出全球的时候,需要哪些流程?</p><blockquote><p><strong>新技术</strong></p></blockquote><p>43:21 智能化是不可逆的趋势,但不能激进</p><p>46:50 车车通讯在L3实现以后需要重新定义</p><p>51:54 科技和豪华的关系:如果智能平权,豪华的标准变了吗</p><p>01:01:49 去年从规则算法切换到端到端的过程挺煎熬的</p><p>01:04:40 激光雷达是一个很好的冗余</p><p>01:05:35 CLA和豆包合作大语言模型</p><blockquote><p><strong>139岁车企</strong></p></blockquote><p>01:09:36 以人为本</p><p>01:11:08 安全的步骤</p><p>01:13:08 世界上第一位汽车驾驶员是奔驰创始人的太太</p><p>01:15:00 在百年企业工作是什么体验——荣耀是什么?负担是什么?</p><p>01:17:48 百年一遇的大转型、大变革</p><p>01:33:22 奔驰CEO康林松会发脾气吗?</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>相关单集:</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68300e93fcbc2e206b58eb2b" rel="noopener noreferrer nofollow" target="_blank">对话奔驰全球CEO康林松:转型期CEO和转型之中的139岁奔驰</a></p><p>【更多信息】</p><p>联络我们:微博<a href="https://weibo.com/u/6486678714" rel="noopener noreferrer nofollow" target="_blank">@张小珺-Benita</a></p><p>更多信息欢迎关注公众号:张小珺</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
104. 与Rokid祝铭明聊,吴妈、阿里、硬件创业黑森林的第11年。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-06-15 23:00
As AI's software capabilities spill over into hardware, in addition to embodied intelligence, smart glasses may be another industry that will benefit. Today's guest is Misa Zhu, the founder of smart glasses company Rokid. In the first half of 2025, Misa appeared in a speech wearing smart glasses developed by his company, which once attracted attention. **This year also marks his 11th year of entrepreneurship in the hardware dark forest.** We started by talking about the acquisition of his first company by Alibaba for 10 million US dollars—we talked about **Jack Ma and Wu Ma**, and also about his second venture, **the China-US comparison, stages, and trends of the smart glasses market.** Our podcast program premiered on Tencent News. You can go and follow it to get program information and more news as soon as possible :) 02:00 Quick Q&A starts 02:36 Alibaba acquired my first startup for 10 million US dollars, all converted into stocks 05:14 In the worst times, Jack Ma talked to me and introduced Joe Cai (Joseph Tsai), and also introduced Dr. Wang Jian 08:05 I have two weeks to pay salaries, and there are only 4,000 yuan in the account 15:55 When I was a senior executive at Alibaba, Wu Ma (Wu Yongming) proposed to do AI and established M lab 22:43 Rokid's financing, Jack Ma's advice 27:40 Wu Ma was my direct boss back then, comments on Wu Ma 31:41 Important decision in 2019: switch from AI to AR track within a week 48:00 Will organ-like hardware switch from mobile phones to smart glasses? 59:17 After the important decision, half of the employees were laid off and a building was emptied 01:05:45 First PMF after the transformation 01:09:55 The current smart glasses are in the intermediate stage from BlackBerry to iPhone 1 01:11:52 AI's expansion on hardware: embodied intelligence, wearable intelligence 01:13:05 In smart glasses, the first half of next year will be the time to compete with giants 01:19:29 Jack Ma summarized 4 opportunities for startups to compete with giants: 4 "no's" 01:23:38 Different definitions of smart glasses products in China and the United States 01:41:35 The first company value is playfulness, and the boss is always the trouble maker 01:48:32 Talking about entrepreneurs in Hangzhou 01:59:05 The dark forest of hardware entrepreneurship 02:27:00 Final quick Q&A 【More Information】 Contact us: Weibo @张小珺-Benita For more information, please follow the official account: 张小珺
Original title: 104. 和Rokid祝铭明聊,吴妈、阿里、硬件创业黑森林的第11年
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>随着AI的软件能力向硬件溢出,除了具身智能,智能眼镜或许是另一个会受益的产业。</p><p>今天的嘉宾是智能眼镜公司Rokid创始人祝铭明(Misa),2025上半年Misa佩戴其公司开发的智能眼镜出现在一次演讲中,一度引发关注,<strong>今年也是他在硬件黑森林里创业的第11个年头。</strong></p><p>我们从他的第一家公司1000万美金被阿里并购开始聊起——聊了聊<strong>马云和吴妈</strong>,也聊了聊他的第二段创业、<strong>智能眼镜市场的中美对比、阶段与趋势</strong>。</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FmOqd2jZnPpPWSi6lw-jBpb8-qYE.png" /></figure><blockquote>我们的播客节目在<a href="https://view.inews.qq.com/u/8QIf3n5c64Ucuzne7gI%3D?devid=FF4E49E6-9C89-4986-A413-04E856F31262&qimei=766696f2cd8f313d744bc2c9000012918102&uid=100161026780">腾讯新闻首发</a>,大家可以前往关注哦,这样可以第一时间获取节目信息和更多新闻资讯:)</blockquote><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>02:00 开始的快问快答</p><p>02:36 阿里1000万美金收购我的第一个创业公司,全部换成了股票</p><p>05:14 最糟糕的时候,马云找我聊,引荐了Joe Cai(蔡崇信),又引荐了王坚博士</p><p>08:05 我还有两个星期发薪水,账上只有4000块</p><p>15:55 在阿里当高管,吴妈(吴泳铭)提出想做AI,成立M lab</p><p>22:43 Rokid的融资、马云的建议</p><p>27:40 吴妈当年是我的顶头上司,对吴妈的comments</p><p>31:41 2019年重要决策:一星期内从AI切换AR赛道</p><p>48:00 像器官一样的硬件会从手机切换到智能眼镜?</p><p>59:17 重要决策之后裁员了一大半,清空了一幢楼</p><p>01:05:45 转型后第一次PMF</p><p>01:09:55 现在的智能眼镜在黑莓到iPhone 1的中间阶段</p><p>01:11:52 AI在硬件上的展开:具身智能、随身智能</p><p>01:13:05 在智能眼镜,明年上半年会是与巨头竞争的时间点</p><p>01:19:29 马云总结创业公司和巨头竞争的4个机会:4个不</p><p>01:23:38 中美定义智能眼镜产品的不同</p><p>01:41:35 公司价值观第一条是玩心,老板总是那个trouble maker</p><p>01:48:32 聊聊杭州创业者们</p><p>01:59:05 硬件创业的黑森林</p><p>02:27:00 最后的快问快答</p><p>【更多信息】</p><p>联络我们:微博<a href="https://weibo.com/u/6486678714">@张小珺-Benita</a></p><p>更多信息欢迎关注公众号:张小珺</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
Lovart founder Chen Mian reviews the past two years of application entrepreneurship: This moment is so awesome!! HahahahahaFrom 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-06-08 23:00
Today's guest is another AI application entrepreneur, Chen Mian, the founder of Lovart. His product has become another Agent with a certain global reputation in 2025 after Manus. The difference is that Manus is a general Agent, and Lovart is a vertical Agent, for designers. Rather than saying he is a CEO who makes products, **his mental state is closer to that of a "fighting CEO".** This interview took place after Lovart became popular. You can feel the overflowing joy of Chen Mian at this moment, after experiencing a series of resentments in the past 2 years, such as subsidy wars, product delisting, only 4,000 yuan left in the account, and failing to raise funds. This is a snapshot of the mental state of an Agent entrepreneur in 2025. The curtain of the wave has just been raised. 2025, look forward to our common progress with AI! Our podcast program premiered on Tencent News. You can go and follow it to get program information and more news as soon as possible :) 03:00 Quick questions and answers **Wandering** 05:00 A 90s' 10 years of mobile Internet experience with constant job hopping (Tencent, 360, Baidu, Didi, Mobike, Meituan, Missfresh, ByteDance Education, and Jianying) 07:02 Experienced two battles, the peak was when the battle was in full swing, and then it was a mess 13:58 Starting from 0 to 1 to make Guagualong, just promoted to Byte 4-1, and encountered the "Double Reduction" policy 15:18 Would it be better to change the choice? **AI is here, I feel rescued** 25:25 AI is at least the invention of the computer, a change comparable to the information revolution (intelligence vs informatization) 28:58 The moment of being redeemed: "Hope is the antidote to all pain, and the meaning of all pain" 29:51 Avoid the main track of large models and the main axis of language, and choose multi-modality and creation **2023: One second I won first place in China, the next second it was delisted, layoffs, and no money** 36:00 The first investor I met was Zhang Yutong 37:43 June-September 2023, I fought with all my heart! ——Burned 2 million US dollars in 3 months 39:03 One second I just won first place in China, the next second it was delisted, layoffs, and no money 40:45 How to view advertising? How to view Kimi advertising? 42:35 What is the feeling of being delisted? Crashing 44:09 The company's account only has 4,000 yuan left 45:17 What is the current customer acquisition cost? How to effectively acquire users? 49:38 The opportunity is fleeting, and the results must be expanded when the rhythm is good **2024: Crazy investment** 50:05 2024 crazy investment, one round of financing per month, closed 3 rounds 52:21 We are very clear about the limitations of the first generation product liblib, and began to consider the second generation product 55:58 How was the second-generation product Lovart pre-researched? **2025: Lovart is popular** 59:48 If this designer is called Lovart, he/she also loves art, which is quite cool 01:01:47 What does competing for the "world's first XX Agent" bring? 01:03:00 Why is getting an invitation code a standard feature? 01:03:56 After Lovart became popular 01:07:30 Know-how of AI application entrepreneurship **This is the most!! awesome!! thing in my entrepreneurship!!** 01:19:57 I am a Gemini, sometimes crazy, sometimes very soft 01:24:04 Coexist with anxiety, just do it! 01:25:26 This is the most!! awesome!! thing in my entrepreneurship!! 01:28:00 But at this moment it's so cool!!! I was happy for a while - it's my simple happiness hahahahaha 01:28:32 No amount of money or job title can buy it 01:32:35 Innovation in unfamiliar fields is like repeatedly sliding a match on wet wood, igniting and extinguishing; until one day, you seize a gap, ignite the wood, and the fire spreads throughout the cave 01:33:58 At the end of 2023, I went to the Hillhouse office, and in the sunlight, I was in a daze **Make a big scene, leave quietly** 01:35:00 Childhood: wandering, martial arts novels and computer games 01:26:01 I don't know where my hometown is, I can only keep moving forward 01:38:46 Advice for other AI application entrepreneurs 01:42:29 Final quick questions and answers [Agent Entrepreneurship Trilogy in the First Half of 2025] 3-hour interview with Manus founder Xiao Hong: The world is not a linear extrapolation, but an important variable in the game 3-hour interview with YouWare founder Ming Chaoping: Today's Agent is like a gorilla just picking up a burning stick Lovart founder Chen Mian reviews the past two years of application entrepreneurship: This moment is so cool!! hahahahaha [More Information] Contact us: Weibo @张小珺-Benita For more information, please follow the official account: 张小珺
Original title: 103. Lovart创始人陈冕复盘应用创业这两年:这一刻就是好爽啊!!哈哈哈哈哈
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾又是一位AI应用创业者,Lovart创始人陈冕。</p><p>他的产品成为2025年既Manus之后,另一个在全球斩获一定知名度的Agent。不同的是,Manus是通用Agent,Lovart是垂直Agent,面向设计师使用。</p><p>与其说他是做产品的CEO,<strong>他的精神状态更贴近一名“战斗型CEO”。</strong></p><p>这次访谈发生Lovart火了之后,你能感受到陈冕在过去2年遭遇了补贴战争、产品下架、账上只剩4000块现金的绝境、怎么都融不到资等一系列愤懑之后——此时此刻,充斥着的要溢出的快乐。</p><p>这是2025年对一位Agent创业者精神状态的截取。浪潮的大幕才刚刚拉开。</p><p>2025,期待我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/Fhg0fHfEjPJE44bF_5suSeFO9TmJ.png" /></figure><blockquote>我们的播客节目在<a href="https://view.inews.qq.com/u/8QIf3n5c64Ucuzne7gI%3D?devid=FF4E49E6-9C89-4986-A413-04E856F31262&qimei=766696f2cd8f313d744bc2c9000012918102&uid=100161026780" rel="noopener noreferrer nofollow" target="_blank">腾讯新闻首发</a>,大家可以前往关注哦,这样可以第一时间获取节目信息和更多新闻资讯:)</blockquote><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><p>03:00 开始的快问快答</p><blockquote><strong>漂泊</strong></blockquote><p>05:00 一个90后的不断跳槽的10年移动互联网经历</p><p>(腾讯、360、百度、滴滴、摩拜、美团、每日优鲜、字节教育和剪映)</p><p>07:02 经历了两次战斗,战斗正酣的时候是顶点,后面一地鸡毛</p><p>13:58 从0到1做瓜瓜龙,刚升字节4-1,就撞上双减了</p><p>15:18 换一种选择,会更好吗?</p><blockquote><strong>AI来了,觉得自己被解救了</strong></blockquote><p>25:25 AI至少是电脑的发明,比肩信息革命的变革(智能化vs信息化)</p><p>28:58 被救赎的一刻:“希望是一切痛苦的解药,是一切痛苦的意义”</p><p>29:51 避开大模型主航道和语言主轴,选择多模态、创作</p><blockquote><strong>2023年:前一秒赢了中国第一,下一秒被下架了、裁员了、没钱了</strong></blockquote><p>36:00 见的第一个投资人是张予彤</p><p>37:43 2023年6月-9月,我全情的战斗!——3个月烧了200万美金</p><p>39:03 前一秒刚赢了中国第一,下一秒被下架了、裁员了、没钱了</p><p>40:45 怎么看投流?怎么看Kimi投流?</p><p>42:35 被下架什么心情?奔溃啊</p><p>44:09 公司账上只剩4000块</p><p>45:17 现在获客成本是多少?怎么有效获取用户?</p><p>49:38 时机稍纵即逝,好的节奏时一定要扩大战果</p><blockquote><strong>2024年:哐哐哐狂投</strong></blockquote><p>50:05 2024年哐哐狂投,一个月一轮融资,close了3轮</p><p>52:21 我们非常清楚第一代产品liblib的局限性,开始考虑第二代产品</p><p>55:58 第二代产品Lovart是怎么预研的?</p><blockquote><strong>2025年:Lovart火了</strong></blockquote><p>59:48 如果这个设计师叫Lovart,他/她又Love art,还蛮酷的</p><p>01:01:47 争抢“全球第一个XX Agent”究竟带来什么?</p><p>01:03:00 为啥搞邀请码成了标配?</p><p>01:03:56 Lovart火了之后</p><p>01:07:30 AI应用创业的know-how</p><blockquote><strong>这是我创业最!!爽的!!东西!!</strong></blockquote><p>01:19:57 我是双子座,时而发狂,时而很软</p><p>01:24:04 与焦虑共生,就是干!</p><p>01:25:26 这是我创业最!!爽的!!东西!!</p><p>01:28:00 但在这一刻就是好爽啊!!!我爽了好一会儿——就是我朴实的快乐哈哈哈哈哈</p><p>01:28:32 给我多少钱、给我多少职级,都买不到</p><p>01:32:35 在陌生领域的创新,就像用火柴在潮湿的木头上反复地滑动,点燃又熄灭;直到有一天,你抓住了某一个缝隙,把木柴点燃,火势弥漫整个山洞</p><p>01:33:58 2023年底去高瓴办公室,阳光中,我恍惚了</p><blockquote><strong>大闹一场,悄然离去</strong></blockquote><p>01:35:00 童年:漂泊、武侠小说和电脑游戏</p><p>01:26:01 我不知道故乡是哪,只能一直往前走</p><p>01:38:46 给其他AI应用创业者的建议</p><p>01:42:29 最后的快问快答</p><figure><img src="https://image.xyzcdn.net/FvVbUNblF7FHIjfdp3MmmbAdLZ8G.png" /></figure><p>【2025上半年Agent创业三部曲】</p><p><a href="https://www.xiaoyuzhoufm.com/episodes/67c3d80fb0167b8db9e3ec0f">对Manus创始人肖弘的3小时访谈:世界不是线性外推,做博弈中的重要变量</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68372c9631215eb5063bcdb1">对YouWare创始人明超平3小时访谈:今天Agent像大猩猩刚拿起一根烧火棍</a></p><p><a href="https://www.xiaoyuzhoufm.com/episodes/68455e0a6dbe9284e75c6fbf">Lovart创始人陈冕复盘应用创业这两年:这一刻就是好爽啊!!哈哈哈哈哈</a></p><p>【更多信息】</p><p>联络我们:微博<a href="https://weibo.com/u/6486678714" rel="noopener noreferrer nofollow" target="_blank">@张小珺-Benita</a></p><p>更多信息欢迎关注公众号:张小珺</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>
-
与张祥雨聊,多模态研究的挣扎史和未来两年的2个“GPT-4时刻”。From 🇨🇳 张小珺Jùn|商业访谈录, published at 2025-06-02 23:00
Today's episode of "Business Interviews" welcomes its first co-host, the familiar Li Guangmi. Guangmi invited Zhang Xiangyu, Chief Scientist of the large model company StepUp Starry, to talk about the past, present, and future technological frontiers of multi-modality. In this episode, Zhang Xiangyu elaborates on: **his 10-year history of participating in multi-modality, his new thinking on multi-modality, and his predictions for the next "GPT-4 moment."** He mentioned a detail: during the training process, he discovered an inexplicable phenomenon - the model's general conversational ability, emotional intelligence, and knowledge all became stronger as the model grew larger, but the model's reasoning ability (especially in mathematics) first increased and then flattened out, and then decreased as it expanded further - this point has not yet triggered widespread discussion in the industry. He also gave his own explanation for this **strange phenomenon**. Below is the conversation between Guangmi and Xiangyu. 2025, let's progress together with AI! Our podcast is first released on Tencent News, you can go and follow it to get program information and more news as soon as possible :) **10-Year History of Multi-Modal Research: Confusion and Turning Points** 02:00 Zhang Xiangyu's academic experience and personal research focus 12:25 Learning history from CV (Computer Vision) to NLP (Natural Language Processing) 17:14 In 2022, I became pessimistic about relying solely on vision to learn the "GPT moment in the CV field" 18:22 What's wrong with the pure vision domain? **A generative model like GPT can simultaneously possess generation, understanding, and human alignment, while these three are separated in static images.** 24:23 I stopped researching static image representation and conceived a new research topic: using the alignment relationship between vision and language in the short term 29:10 After trying, I still couldn't achieve the integration of image understanding, generation, and alignment. I got an increasingly strong generation model and an increasingly strong understanding model, which did not have a superimposed effect - why is it so difficult to integrate? 38:45 **I was very confused for more than half a year, but a turning point appeared at this moment.** **Strange things, clues, and solutions discovered in training large models** 41:11 A perplexing strange thing was discovered during the training process: **the model's general conversational ability, emotional intelligence, and knowledge are indeed stronger the larger the model, but the model's reasoning ability (especially in mathematics) performance first increases and then flattens, and then decreases as it expands further.** 43:10 Some clues: larger models tend to skip steps and are not honest when doing math problems 44:33 After analysis, **this is the essential defect of next token prediction** 45:42 A larger compression rate does not necessarily correspond to higher calculation accuracy, let's do a thought experiment 47:27 "Feature collapse phenomenon" of generative models 50:48 The solution is to introduce RL (Reinforcement Learning) 53:28 **The core of o1 is the pattern of the chain of thought** - "do thinking models, pattern is all you need" 01:01:52 When the model reaches a certain step, there are two branches in front of it - go left? Or go right? - Can it be solved within one token? (critical decision) - No, so introduce a reflection pattern 01:10:16 The essence of the o1 paradigm is a Meta-CoT, which is CoT's CoT **New Thinking and New Progress on Multi-Modal Research** 01:10:57 After studying o1, I returned to study why visual generation has such poor controllability, and I had some clues 01:15:13 Simply putting generation and understanding together is very difficult, missing an important link CoT 01:15:54 **Started a new project in the middle of last year: visual understanding (Long CoT in visual space)** 01:19:06 Let me reveal the results to you after trying for half a year! 01:21:30 The o series not only generalizes the domain, but also generalizes the pattern, which is more attractive 01:22:16 Game-like problems are difficult to generalize, and there are many invalid thoughts and low-level errors 01:24:07 The reflection pattern inspired by o1 is distributed in the pre-training corpus 01:31:31 There are two theories about pre-training plus multi-modal data: does it affect text IQ? Or does it enhance the scaling law? 01:36:43 Walk on two legs in the future: expand the pre-training corpus and expand the action space 01:45:42 How long until the "GPT-4 moment" of multi-modality? **Predicting the next "GPT-4 Moment"** 01:46:56 long context and multi-model collaboration 02:07:09 Architecture is not important, architecture serves algorithms and systems (why I say Linear Transformer is not essential) 02:08:30 **The next "GPT-4 moment"? Model's online learning/autonomous learning** 02:21:22 Clarify some views on Agent 02:25:00 Although humans do not have generative organs, they have world models 02:26:34 Our intelligence level is still struggling for vision, and the robotics field is running ahead 【More Information】 Contact us: Weibo @张小珺-Benita For more information, please follow the official account: 张小珺
Original title: 102. 和张祥雨聊,多模态研究的挣扎史和未来两年的2个“GPT-4时刻”
Original description: <figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天这集,《商业访谈录》第一次迎来一位co-host,是大家熟悉的李广密。</p><p>广密邀请了大模型公司阶跃星辰的首席科学家张祥雨,来聊聊,多模态的前世今生和未来技术的前沿走向。</p><p>张祥雨在这集节目详细阐述了:<strong>他参与的多模态的10年历史,对多模态的全新思考,以及所预见的下一个“GPT-4时刻”。</strong></p><p>他提到一个细节:在训练过程中他曾经发现一件百思不得其解的现象——模型的通用对话能力、情商和知识量都是随着模型变大变得更强,但模型的推理能力(尤其是数学)表现却是先上升后平缓,再扩大反而是下降——这点在业界还未引发广泛讨论。关于这个<strong>怪现象</strong>,他也给出了自己的解答。</p><p>下面是广密和祥雨的聊天。</p><p>2025,我们和AI共同进步!</p><figure><img src="https://image.xyzcdn.net/Fm2F9n8vMm_n-xafqMi98xs3T3K8.png" /></figure><figure><img src="https://image.xyzcdn.net/FiSVQGPuUlWbTkbF5UYXQXUufs8Q.png" /></figure><blockquote>我们的播客节目在<a href="https://view.inews.qq.com/u/8QIf3n5c64Ucuzne7gI%3D?devid=FF4E49E6-9C89-4986-A413-04E856F31262&qimei=766696f2cd8f313d744bc2c9000012918102&uid=100161026780" rel="noopener noreferrer nofollow" target="_blank">腾讯新闻首发</a>,大家可以前往关注哦,这样可以第一时间获取节目信息和更多新闻资讯:)</blockquote><figure><img src="https://image.xyzcdn.net/FvV-R5FBydYHGZAMyXAV1K1A9iJT.png" /></figure><blockquote><strong>多模态研究的10年史:迷茫和转机</strong></blockquote><p>02:00 张祥雨的学术经历和个人研究主线</p><p>12:25 CV(计算机视觉)向NLP(自然语言处理)的学习历史</p><p>17:14 2022年我开始对单纯靠视觉学出“CV领域的GPT时刻”比较悲观</p><p>18:22 纯视觉这个domain有什么问题?<strong>GPT这样的生成模型你可以同时拥有生成、理解和人类对齐,而静态图像这三者是割裂的</strong></p><p>24:23 我停止了对静态图像表征的研究,构思新的研究主题:短期内利用视觉和语言的对齐关系</p><p>29:10 经过尝试还是没做到图像的理解、生成和对齐一体化,我得到一个越来越强的生成模型,和一个越来越强的理解模型,没有起到叠加效果——为什么如此难以融合?</p><p>38:45 <strong>做了大半年十分迷茫,但在此刻出现了转机</strong></p><blockquote><strong>训练大模型发现的怪事、蛛丝马迹与办法</strong></blockquote><p>41:11 训练过程中发现了一件百思不得其解的怪事:<strong>模型的通用对话能力、情商、知识量确实模型越大越强,但模型的推理能力(尤其是数学)表现是先上升后平缓,再扩大反而是下降</strong></p><p>43:10 一些蛛丝马迹:更大的模型做数学题倾向于跳步,不老实</p><p>44:33 经过分析,<strong>这是next token prediction的本质缺陷</strong></p><p>45:42 更大的压缩率未必对应更高的计算精度,我们来做一个思想实验</p><p>47:27 生成模型的“特征坍缩现象”</p><p>50:48 解决方案就是引入RL(强化学习)</p><p>53:28 <strong>o1的核心是思维链的pattern</strong>——“做思考模型,pattern is all you need”</p><p>01:01:52 当模型走到某一步,摆在面前有两个分支——走左边?还是走右边?——一个token之内到底能不能解决?(critical decision)——不能,所以引入反思pattern</p><p>01:10:16 o1范式的本质是一种Meta-CoT ,是CoT的CoT</p><blockquote><strong>对多模态研究的新思考和新进展</strong></blockquote><p>01:10:57 研究完o1,返回研究为什么视觉生成可控性这么差,就有了眉目</p><p>01:15:13 简单把生成和理解做到一起,难度非常大,缺失了重要一环CoT</p><p>01:15:54 <strong>去年中开启新的project:视觉理解(视觉空间的Long CoT)</strong></p><p>01:19:06 尝试了半年,结果给大家透露一下吧!</p><p>01:21:30 o系列不仅泛化了domain,更吸引人的是泛化了pattern</p><p>01:22:16 博弈类问题是难以泛化的领域,有很多无效思考和低级错误</p><p>01:24:07 o1激发的反思pattern,在预训练语料中都有分布了</p><p>01:31:31 关于预训练加多模态数据有两种说法:影响了text智商?还是增强了scaling law?</p><p>01:36:43 往后两条腿走:扩充预训练语料和扩展动作空间</p><p>01:45:42 多模态的“GPT-4时刻”还有多久</p><blockquote><strong>预见下一个“GPT-4时刻”</strong></blockquote><p>01:46:56 long context和多模型协作</p><p>02:07:09 架构不重要,架构是服务算法和系统的(为什么我说Linear Transformer不本质)</p><p>02:08:30<strong> 下一个“GPT-4时刻”?模型的在线学习/自主学习</strong></p><p>02:21:22 澄清一些有关Agent的观点</p><p>02:25:00 人虽然没有生成器官,但人有世界模型</p><p>02:26:34 我们的智能水平还在为视觉挣扎,机器人领域在抢跑</p><p>【更多信息】</p><p>联络我们:微博<a href="https://weibo.com/u/6486678714" rel="noopener noreferrer nofollow" target="_blank">@张小珺-Benita</a></p><p>更多信息欢迎关注公众号:张小珺</p><figure><img src="https://image.xyzcdn.net/Fn7o36NtUYpCM_rQiFj1LW-TIwk8.JPG" /></figure>