Today's guest is Yang Zhilin, founder and CEO of Moonshot AI, last on our show ("Business Interview") 1.5 years ago.
This past July, the Kimi K2 model was released, drawing widespread attention. K2 is an open-source programming and Agentic large language model based on MoE architecture. Figuratively, the model uses programming to escape the closed "brain in a vat," growing "hands" to manipulate the external digital world.
**Today, I talked with Yang Zhilin about K2's R&D, his current technical understanding, and technical judgments.**
And, **as a founder, his feelings and thoughts during the past year's public opinion storms and entrepreneurial ups and downs.**
01:49 **An Infinite Mountain**
It's like a book I'm reading: The Beginning of Infinity.
Maybe one day we'll find this snow mountain has no end, I hope it never ends.
But it's still a "brain in a vat": Imagine a fish tank, you put a brain in it, with no connection to the outside world.
Whether it's reinforcement learning based on long thinking, or Agent's reinforcement learning, they point to the same thing: test-time scaling.
Another interesting trend is that more model companies are now making "one-party Agent products."
L1 to L5 are not necessarily serial relationships. Claude bets on this: it doesn't do much in Reasoning, but it does very well in Agent.
Only when the model participates in the development process can the real Innovator (L4) stage be unlocked.
24:58 **K2 is K2 (Mount Godwin-Austen)**
K2's key points: First, we want it to be a very good base model.
We want to maximize the use of every piece of data, the so-called token efficiency - feeding the same amount of data, the "brain" grows more.
We will do a lot of Rephrase operations on the data.
We are very concerned about the Muon optimizer, which greatly improves token efficiency.
Second, we want K2 to have good Agentic capabilities. For Agentic models, the biggest challenge is model generalization.
It may be a transformation from a "brain in a vat" to being able to interact with the world, because the most important feature of an Agent is that it can use tools in multiple rounds.
Humans are the so-called universal constructor.
There is a potential idea that AI needs to be trained in a more AI native way.
Muon will explode when you train it.
54:08 **A Simple and Complex System**
Why did Kimi switch from closed source to open source?
The model training is complete, and the product is basically complete. Making interaction improvements is certainly valuable, but that's the icing on the cake.
It's already good if multi-modality doesn't damage the "brain."
The multi-modality you learn may be a "dumb multi-modality," we want it to be a "smart multi-modality."
Scaling Law has encountered a data wall, which is an objective fact.
The data flywheel is very dependent on external feedback. We don't want the feedback to have a lot of noise, but we haven't solved this problem very well now.
It now seems that scaling based on FLOPs is a more effective path, but when will this balance change?
Many Long Context architectures affect "intelligence."
Pure Linear Attention may affect intelligence because this architecture has some bias.
Where are the long-term boundaries between base model companies and application companies that make Agent products?
How to think about the business model today? Is API a good business?
Can Kimi make money?
01:25:05 **In Your Own Story**
Tim (Zhou Xinyu) tells me every day - manage with RL, not SFT.
The biggest problem with managing a team with RL is that you are easily hacked.
A lot of complexity is artificially added, it's not that complicated in reality.
You can only say that you are in your own story - you constantly feel what kind of person you are, why you want to do this thing.
I also asked Kimi this question, and he said that AI is the "amplifier of human civilization."
This is also what Kimi told me - any intermediate state may become the object of criticism.
There is definitely fear, but it's more important to focus on what you can do in the current step. - Thinking about this question is more important.
2024 Interview with Yang Zhilin:
《Chatting with Yang Zhilin about the past year of large model entrepreneurship: the increment of human ideals, probabilistic non-consensus, and Sora》
[More Information]
Text and video versions are launched simultaneously.
For the text version, please go to the official account: language is world
For the video version, please go to Bilibili: Zhang Xiaojun Business Interview
Original title:
113. 和杨植麟时隔1年的对话:K2、Agentic LLM、缸中之脑和“站在无限的开端”
Original description:
<figure><img src="https://image.xyzcdn.net/Flo18nNUSP7OUNlTf8UgCdHxio6O.jpg" /></figure><p>今天的嘉宾是月之…