The All-Star Chinese AI Conversation of 2026

Zhipu, Moonshot, Qwen, and Tencent founders and top researchers talk US-China, open vs closed, business models, and where AI is headed in 2026

Irene Zhang, Jordan Schneider, Nick Corvino, and 2 others

Jan 13, 2026

On January 10, Tsinghua University and Zhipu (the Beijing-based foundation model startup that recently went public) co-hosted AGI-Next, a summit for frontier AI, in Beijing.

The event included a series of keynotes by Tang Jie 唐杰 (Zhipu’s founder), Yang Zhilin 杨植麟 (CEO of Moonshot AI, which is behind the Kimi models), Lin Junyang 林俊旸 (tech lead for Qwen at Alibaba), and Yao Shunyu 姚顺雨 (current Principal AI Researcher at Tencent, formerly of OpenAI), followed by a panel.

Cyber Zen Heart 赛博禅心, a well-known tech influencer account (which we previously covered on ChinaTalk), released a transcript of the conversation online, and we’ve translated an abbreviated version into English here (we edited their discussion down to half of what was originally a 40-page Chinese transcript). This is a fascinating conversation on the AI landscape in China, covering the technical side, corporate dynamics, as well as the future as envisioned by China’s most important industry titans. The conversation includes:

A honest look at whether China’s open-source leadership has actually narrowed the technology gap with the US;
China’s emerging AI-for-business paradigm and why Palantir is an inspiration,
And what it will take for Chinese researchers to take riskier bets.

A bit to taste from Tencent’s Yao Shunyu:

So, I think there are several key points. One is whether China can break through on lithography machines. If compute ultimately becomes the bottleneck, can we solve the compute problem? At the moment, we have strong advantages in electricity and infrastructure. The main bottlenecks are production capacity — especially lithography — and the software ecosystem. If these are solved, it would be a huge help.
Another question is whether, beyond the consumer side, China can develop a more mature and robust To-B market — or whether Chinese companies can really compete in international commercial environments. Today, many productivity-oriented or enterprise-focused models and applications are still born in the U.S., largely because willingness to pay is higher and the business culture is more supportive. Doing this purely within China is very difficult, so many teams choose to go overseas or pursue international markets. These are two major structural constraints.
More important are subjective factors. Recently, when talking with many people, our shared feeling is that China has an enormous number of very strong talents. Once something is proven doable, many people enthusiastically try it and want to do it even better.
What China may still lack is enough people willing to break new paradigms or take very risky bets. This is due to the economic environment, business environment, and culture. If we could increase the number of people with entrepreneurial or risk-taking spirit — people who truly want to do frontier exploration or paradigm-shifting work — that would help a lot. Right now, once a paradigm emerges, we can use very few GPUs and very high efficiency to do better locally. Whether we can lead a new paradigm may be the core issue China still needs to solve, because in almost everything else — business, industrial design, engineering — we are already, in some respects, doing better than the U.S.
…In China, people still prefer to work on safer problems. For example, pretraining has already been proven to be doable. It’s actually very hard and involves many technical challenges, but once it’s proven doable, we’re confident that within a few months or some period of time, we can basically figure it out. But if today you ask someone to explore long-term memory or continual learning, people don’t know how to do it or whether it can even be done, which is still a tough situation.

And Lin Junyang who works at Alibaba on Qwen:

U.S. compute may overall exceed ours by one to two orders of magnitude. What I see is that whether it’s OpenAI or others, a huge amount of their compute is invested into next-generation research. For us, by contrast, we’re relatively constrained — just fulfilling delivery requirements already consumes the vast majority of our compute. This is a major difference.
Perhaps this is a long-standing question throughout history: is innovation spurred by the hands of hand of the rich or the poor? The poor are not without opportunities. We sometimes feel that the rich waste GPUs, training many things that turn out not to be useful. But when you’re poor, things like algorithm-infrastructure co-optimization become necessary. If you’re very rich, there’s little incentive to do that.
Going one step further, as Shunyu mentioned with lithography machines, there may be another opportunity in the future. From a hardware-software co-design perspective, is it possible to truly build something new? For example, could the next-generation model and chip be designed together?
Americans naturally have a very strong risk-taking spirit. A classic example is early electric vehicles — despite leaking roofs and even fatal accidents, many wealthy people were still willing to invest. In China, I believe wealthy people would not do this; they prefer safe things. But today, people’s risk-taking spirit is improving, and as China’s business environment improves, innovation may emerge. The probability isn’t very large, but it is real.

Comments in brackets [ ] are our clarifying notes

Three of the “Four Heavenly Kings” of open source were present [a Buddhist reference]—DeepSeek couldn’t attend for reasons everyone knows [they’re grinding to drop a new model].

One roundtable, with participants including: Yang Qiang, Tang Jie, Lin Junyang, Yao Shunyu (joining remotely).

The closing remarks came from the highly respected Academician Zhang Bo 张钹.

The AGI-Next event was convened by Professor Tang Jie—his ability to bring people together is in a league of its own.

Making Machines Think Like Humans

Speaker: Tang Jie (Chief Scientist at Zhipu, Professor at Tsinghua University)

[Note: Zhipu AI/智谱 is one of China’s leading AI companies, which focuses on serving state customers. They’ve had an executive appear on ChinaTalk and their flagship model is GLM.]

...

Starting in 2019, we began thinking: can we make machines truly think, even just a little bit, like humans? So in 2019, we spun off from Tsinghua’s research achievements [成果转化 - “achievement transformation,” is the formal Chinese term for university tech transfer/commercialization]. With strong support from the university at the time, we founded this company called Zhipu. I’m now Chief Scientist there. We’ve also open-sourced a lot — you can see many open-source projects here, and on the left there are various things related to large model API calls.

I’ve been at Tsinghua for about 20 years — I graduated in 2006, so this year marks exactly 20 years. Looking back at what I’ve actually been doing, I’d summarize it as just two things: First, I built the AMiner system back in the day [AMiner is an influential academic search and mining platform]; second, the large models I’m working on now.

I’ve always held a view that has influenced me quite a bit — I call it “doing things with the spirit of coffee.” This actually relates closely to one of our guests here today: Professor Yang Qiang. One time after meeting in the café, I said I’d been drinking way too much coffee lately, maybe I should quit, it can’t be good for my health. Professor Yang’s first response was “Right, you should cut back.” Then he said, actually no—if we could be as addicted to research as you are to coffee, wouldn’t our research be excellent?

This idea of being “addicted to coffee” [喝咖啡上瘾] really struck me at the time, and it’s influenced me from 2008 until now — the idea that doing things well probably means being focused, and just keeping at it. This time I happened to encounter AGI, which is exactly the kind of thing that requires long-term investment and sustained effort. It’s not quick wins — you don’t do it today, see results tomorrow, and wrap up the day after. It’s very long-term, which makes it precisely worth investing in.

In 2019, our lab was actually doing quite well internationally in graph neural networks and knowledge graphs. But at that time, we firmly paused both of those directions — temporarily stopped working on them. Everyone pivoted to large models, everyone started launching research related to large models. And as of today we’ve had some real accomplishments.

Zhongguancun Science and Technology Park (中关村科技园), a tech-industry hub in Beijing where many AI companies have taken root. Source.

…

Everyone still remembers earlier this year, I think there were two main directions: one was simple programming — doing Coding, doing Agents; the second was using AI to help us do research, similar to DeepResearch, even writing complex research reports. These two paths are probably quite different, and this is also a result of making choices. On one hand, you do Thinking and add some coding scenarios; on the other hand, you might want to interact with the environment, making the model more interactive, more dynamic — how do you do that?

In the end, we chose the path on the left — we gave it Thinking capability. But we didn’t abandon the right side either. On July 28th we did something that was relatively successful: we integrated coding, agentic, and reasoning capabilities together. On July 28th we released GLM 4.5, and got pretty good results in agents, reasoning, and code. All the models — domestically, including today’s Qwen and Kimi — are really chasing each other [a fun idiom 你追我赶 — “you chase me, I chase you”], Sometimes one is ahead, sometimes another is. On that particular day, we were in front.

We opened up this 4.5 for everyone to use — go ahead and code with it, our capabilities are pretty good now. Since we chose Coding and Agent, it could handle many programming tasks, so we let it write these very complex scenarios. Then users came back and told us: for example, if we want to code a Plants vs. Zombies game, this model can’t do it.

Real environments are often very complex. This game is automatically generated from a single prompt — including the whole game being playable, users can click to score, choose which plants, how to fight the zombies, zombies walking in from the right, including the interface, including the backend logic, all automatically written from one sentence by this program. At this point, 4.5 couldn’t do this scenario — lots of bugs appeared. What’s going on?

Later we discovered that in real programming environments, there are many problems inside. For example, in editing environments like the one above, there are many problems that need solving. This is exactly where RLVR [Reinforcement Learning with Verifiable Rewards] comes in — reinforcement learning with verifiable environments. So we collected a large number of programming environments, used the programming environment as reinforcement, plus some SFT data, enabling two-way interaction to improve the model’s effectiveness. Overall, it’s exploring through verification. So at that time we got very good scores on SWE Bench, and recently we’ve gotten very good scores as well.

…

Next question: can we continue scaling going forward? What’s our next AGI paradigm? We face more challenges ahead.

We just did some open-sourcing, and some people might feel excited, thinking China’s large models seem to have surpassed America’s. Actually, the real answer is probably that our gap might still be widening, because American large models are mostly still closed-source. We’re playing in open source to make ourselves feel good, but our gap hasn’t narrowed the way we imagined. In some areas we might be doing pretty well, but we still need to acknowledge the challenges and gaps we face.

What should we do next? I think from the entire development history of large models, it’s really referencing the human brain’s cognitive learning process. From the earliest large models — you had to memorize all the world’s long-term knowledge, just like children who first read books from a young age, memorize all the knowledge first, then gradually learn to reason, learn math problems, learn more deduction and abstraction.

For the future, it’s the same principle. For human brain cognitive learning, what capabilities exist that current large models don’t have, but humans far exceed us in:

First, 2025 was the year of multimodal adaptation. Many multimodal models including ours haven’t drawn much attention as most are working on improving text intelligence. For large models, how do we collect multimodal information and unify perception — what we often call “native multimodal models.” Later I thought about it, and native multimodal models are quite similar to human “sensory integration” [感统 - short for 感觉统合, sensory integration]. Human sensory integration is: I collect some visual information here, also collect some audio information, also collect some tactile information — how do I integrate all this information together to perceive something? Sometimes when humans have brain issues, often it’s insufficient sensory integration — problems from sensory integration dysfunction. For models, how do we build this next level of multimodal sensory integration capability?

Second, current model memory capability and continuous learning capability are still insufficient. Humans have several levels of memory systems — we have short-term memory, working memory, long-term memory. I even chatted with our students and lab members before, and I said it seems like a person’s long-term memory doesn’t actually represent knowledge. Why? Because we humans only really preserve knowledge when we record it — for example, for me, if my knowledge can’t be recorded on Wikipedia, maybe 100 years later I’ll be gone too, I won’t have contributed anything to this world, it doesn’t seem to count as knowledge. It seems like when training future human large models, my knowledge won’t be useful either, it’ll all become noise. How do we take our entire memory system from an individual’s three levels to humanity’s fourth level of recording? This whole memory system is what we humans need to build for large models in the future.

Finally, reflection and self-awareness. Actually, models already have some reflection capability now, but self-awareness in the future is a very difficult problem. Many people question whether large models can have self-awareness capability. Among us there are also many experts from foundational model labs — some support this, some oppose it. I’m somewhat supportive — I think it’s possible and worth exploring.

…

We’re teaching machines the capacity for self-reflection and self-learning — through the machine being able to continuously self-critique, to learn which things it should do, which things it could do more optimally.

Looking to the future, we still need to teach machines to learn even more. For instance, learning self-awareness [自我认知] — letting machines explain their own behavior. Say AI generates massive amounts of content: it can self-explain why it generated this content, what it is, what its goals are. At the ultimate level, perhaps one day AI will also have consciousness.

We’ve roughly defined these five layers of thinking.

From a computer science angle, computers wouldn’t frame things this abstractly. In my view, computers have three fundamental capabilities:

First, representation and computation. You represent data, then you can compute on it.

Second, programming. Programming is the only way computers interact with the outside world.

Third, at its core, search.

But when you stack these capabilities together: First, with representation and computation, storage capacity can far exceed humans. Second, programming can produce logic more complex than what humans can handle. Third, search can be done faster than humans. Stack these three computer capabilities together, and you might get so-called “superintelligence” [超级智能] — perhaps exceeding human capabilities in certain areas.

...

For 2026, what’s more important to me is staying focused and doing some genuinely new things.

First, we’ll probably keep scaling. But scaling the known means constantly adding data, constantly probing the ceiling. There’s also scaling the unknown — new paradigms we haven’t discovered yet.

Second, technical innovation. We’re going to do genuinely new model architecture innovation — solving ultra-long context, more efficient knowledge compression. And we’re going to achieve knowledge memory and continuous learning. Put these two together, and it might be an opportunity to make machines just a little bit stronger than humans.

Third, multimodal sensory integration [多模态感统] — this is a hot topic and key priority this year. Because only with this capability can AI enter into long tasks inside machines, time-extended tasks within our human work environments — inside our phones, inside our computers — completing our long tasks. Once it can complete our long tasks, AI will have achieved an occupation [工种, literally “job type” or “trade” — the implication is AI becomes a worker capable of doing a full job, not just discrete tasks]. AI becomes like us, able to help us get things done. Only then can AI achieve embodiment [具身], only then can it enter the physical world.

I believe this year might be an explosive year for AI for Science, because so many capabilities have dramatically improved — we can do so much more.

That concludes my presentation. Thank you, everyone!

Scaling Law, Model Architecture, and Agent Intelligence

Speaker: Yang Zhilin 杨植麟 (Founder of Moonshot AI & Kimi)

Yang Zhilin’s talk was packed with technical details and formulas; here’s a brief summary:

Optimizing along two dimensions — token efficiency and long context — will lead to achieving stronger agent intelligence.

Yang argued that the key reason Transformers outperform LSTMs isn’t in short sequences, but in long-context settings where the loss is significantly lower — which is exactly the core demand in the agent era. The team used the Muon second-order optimizer to achieve a 2× improvement in token efficiency, and addressed training instability with QK-Clip, successfully completing stable training on the trillion-parameter Kimi K2.

Their next-generation architecture, Kimi Linear, uses Delta Attention (a linear attention mechanism). It outperforms full attention for the first time on long-horizon tasks, while delivering a 6–10× speedup. K2 has become China’s first agent model, capable of two to three hundred steps of tool calls, and it surpasses OpenAI on core benchmarks such as Humanity’s Last Exam (HLE).

Yang emphasized that upcoming models will need more “taste”, because intelligence isn’t like electricity that can be exchanged equivalently — tokens produced by different models are inherently not the same. He quoted a conversation with Kimi: the reason to keep developing AGI is that giving it up would mean giving up the upper bound of human civilization — and we cannot allow fear to bring progress to a halt.

Towards a Generalist Agent

Speaker: Lin Junyang (Alibaba Qwen)

Open Source and Products

We’ve been doing open source for quite a while, starting on August 3, 2023. A lot of people ask us: why do open source at all? A lot of things came together through chance and circumstance. In any case, after sticking with open source all the way through, we ended up doing a lot of work that was, at the very least, fairly industrial in nature. There isn’t a lot of “stuff” in the repo — basically just some scripts that people can look at directly. But we do have a lot of models. Why so many, relatively? In the past, a lot of people didn’t understand why we built small models, but today everyone understands that small models are still quite valuable.

Small models ultimately originated from an internal 1.8B model we used for experiments. We were doing pretraining, and resources were limited — you can’t run every experiment on 7B, so we used 1.8B for validation. At the time, a junior labmate told me we should open-source this model, and I really didn’t understand. I said: in 2023 this model is almost unusable — why would we open-source it? He told me 7B consumes too much compute, and many master’s and PhD students don’t have the resources to run experiments. If we open-source 1.8B, a lot of students would finally be able to graduate on time. That was a really good original motivation.

Then as we kept working, phone manufacturers came to us and said 7B is too big and 1.8B is too small — could you make a 3-4B model for us? That’s easy; it’s not a hard thing to do. As we went along, we ended up with more and more variants and types. To some extent, it has to do with serving the needs of users.

A Xiaomi smartphone factory in China. Source.

Qwen3: Our Biggest Improvements This Year

The biggest progress this year is Qwen3. This is the mascot — kind of looks like a bear, but it’s actually a capybara.

When we were building it, I felt our teammates were working too hard; I didn’t want them to suffer so much. In an era that’s this competitive, being a bit more laid-back isn’t necessarily a bad thing. We’re working across relatively more directions, but you can see that each direction has its own internally consistent logic. For example, we work on Text and VL, and Omni; we’ve also spent relatively longer on vision, text, and speech generation. In the process, one thing that’s special about us is that we’re backed by Alibaba Cloud, and a lot of our business is closely related to Alibaba Cloud’s customers. Cloud customers are very diverse, and we also provide services to everyone such as embeddings and guardrails.

Today, we’ll introduce the main line around Text and VL, including Omni; Coder will be included under Text and discussed accordingly.

Text: Qwen3 Series

This year, for text models, it’s mainly the Qwen3 series, and we’ve already reached 3.5. We spent longer on 3, because the previous generation, 2.5, took a very long time, and one of its biggest characteristics was overall capability improvement. What’s more interesting this year is that reasoning capability needed to improve. If I were to add a bit of my personal understanding, I’d say that reasoning is somewhat different from the current straightforward Instruct models.

Second is the languages and dialects we support. The number of languages alone isn’t that large, but including dialects, it totals 119. Why did we do multilingual support? There were also some coincidences. In 2023, we felt that as long as we did Chinese and English well, we could serve the people we needed to serve. But one time I ran into Korean friends and asked them why, when they were working on the Solar model, they didn’t use our model. They said, “your model doesn’t understand any Korean at all.” I felt really hurt, so I went and checked, and later found that [solving this issue] was actually very simple, so I just went ahead and did it. Later we found that our global users were increasing. I remember some friends in Pakistan kept telling me, “hurry up and support Urdu — we really don’t have any large models we can use.” I thought that was indeed a good thing, so we supported more languages.

We still haven’t finished this. Data from Africa is indeed hard to collect, [so] African languages aren’t covered yet. Today I chatted with some phone manufacturers, and there are still many people in Africa using “dumb” feature phones. We’ve already entered the smartphone era, but they’re still dealing with that, so if you want to help all of humanity, the road ahead is truly long and the responsibility is heavy. If your goal isn’t to help all of humanity, I think it might be better not to do it at all. That’s why we will keep going.

Third is that today’s long text and long video may be one example of this. But I find it really interesting: if you truly want to build a model with self-awareness, first your context has to be long enough. Some people previously debated whether there’s any need to stuff lots of junk into a long context, but only after you have that can you achieve the deeper understanding that comes next. So now we’ve pushed it to over 1M; internally we’ve actually reached several million, and it still might not be enough. That’s why today I still want to say this is a very, very long-term undertaking.

…

Coding: From Olympiad Problems to Software Engineering

Today’s “coder” is different from what we had in the past. For example, last year and the year before, we were mostly solving straightforward competition problems: you’re given a problem and you see whether you can produce the answer. What are we doing today? Software engineering. Back in 2024, people were really surprised by the idea of whether AI could be like a programmer. Today, the task is: maintaining a project is actually pretty hard — if you can just do that, that’s already great.

In actual practice, doing this involves some quite complicated steps for humans. The simplest thing is at least I can open these folders, look at the file names, and know which one I should click into — this is really a multi-turn interaction process. One very important point in building agents today is why everyone talks about multi-turn environment interaction: put plainly, opening a folder and taking a look is itself a way of interacting with the environment. This is important and also very interesting, and it makes us really excited — it can genuinely generate productivity. We want today’s coding models to be productive; the fact that they can write a lot of code is really surprising.

Of course, China and the U.S. are different. I just got back from the Bay Area, and I could feel that the two sides aren’t quite the same. [The difference] is pretty dramatic. Is it that the models aren’t good enough, or that vibe coding still isn’t popular enough? I think the difference is really in how people perceive it. What we want to do is reach the same destination by different paths; everyone wants it to generate productivity.

At the time we paid especially close attention to two benchmarks. One was SWE-bench — can you submit a PR that solves the issue? A score of 70 is a pretty high bar; of course now you can see scores above 75. That was in July; back then, we felt that getting 67 and 69 was already pretty good. Terminal-Bench is also quite hard. Today everyone is using this series of products, and you’ll find that it really does connect directly to your productivity—unlike before. What we’re doing today is tasks that are close to real-world practice. Maybe today it’s only one or two benchmarks, but making it fit real environments and real production tasks better is what we want to do.

When it first came out it was quite popular, but now the competition is too intense. At one point our token consumption even made it to second place on OpenRouter — just to brag a little bit.

…

Visual Understanding: Equipping Models with Eyes

When you build language models, you also have to think about one question: can it have “eyes” to see the world? For example, we just mentioned wanting to build a coding agent to improve productivity: I have to let it operate a computer and see the computer screen. Without eyes it can’t see, so we worked on this with no hesitation. That’s a huge difference: just go and build visual understanding, don’t question it.

But today, many models can actually see things more clearly than humans. For example, I’m nearsighted and I have astigmatism, so my eyesight basically isn’t that great and there’s a lot that I can’t see clearly. But at least I can distinguish up, down, left, and right very easily. AI is interesting: it can see very fine details very clearly, yet when you ask it about front/back/left/right, it for some reason can’t tell. For a long time we evaluated a case called “live subject orientation.” I even asked our evaluators what “live subject” meant. It couldn’t tell whether something was on the left or the right — I found that pretty strange, but that’s exactly the problem we need to solve.

And it’s not just that. Another thing we need to do is make sure its intelligence doesn’t drop. We don’t expect it to dramatically raise its IQ, but at the very least it shouldn’t get dumber, because a lot of the time when you build VL models, they get dumber. This time, we finally made it stop getting dumber — it’s roughly on par with our 235B language model.

…

I want to share a more interesting case. People also ask me these days: how exactly did the open-source community help your team develop this model? If the open-source community hadn’t told us, we would never have thought of this issue ever in our daily lives. There was an image where we basically wanted to remove the person on the right side of the picture. You’d find that after [the model] removed them, when you overlaid the two images, the result looks blurry. It has shifted a bit; it’s no longer in the original position, but instead misaligned. For a lot of people who do Photoshop work, this needs to be extremely precise. You can’t just move things around arbitrarily. So the key focus of version 2511 was solving this problem. In version 2511, when I overlay the two images, the person is basically still in the original position. I think developers gave us a really good use case—showing that we can actually build things that genuinely help them

An example of visual understanding: Chinese internet users have been using Doubao’s videochat function to ask it for outfit instructions, to hilarious effect. Source.

Agent: Towards Simulated and Physical Worlds

Agents can actually move toward both the virtual world and the physical world, which is why there’s an approach like embodied reasoning. Internally we discussed a path: even if you’re building VLA models or coding models, when you strip it down, you’re still converting language into an embodied model. From this perspective it’s extremely encouraging, so we felt like going all-in and seeing whether we can move toward a digital agent. Being able to do GUI operations while also using APIs: that would be a truly perfect digital agent.

And if we move toward the physical world, could it pick up a microphone, and could it pour tea and water today? That’s something we really want to do.

Thank you all very much!

Panel: The Next Step for Chinese AI

Moderator: Li Guangmi

Panel Members: Yang Qiang (HKUST), Tang Jie (Zhipu), Lin Junyang (Qwen), Yao Shunyu (Tencent)

Opening Remarks:

Li Guangmi (Moderator): I am the moderator for the next panel, Li Guangmi. … Let’s start with the first — rather interesting — point: the clear fragmentation (分化) of Silicon Valley companies. Let’s start our conversation around this topic of “fragmentation.”

Anthropic’s model has actually been a great source of inspiration for China; in the face of such intense Silicon Valley competition, they didn’t entirely follow the rest and try to do everything. Instead, they focused on enterprise, coding, and agents. I also am wondering: in what directions will Chinese models end up fragmenting? I think this topic of fragmentation is really interesting.

… Shunyu, could you expand your views on this topic of model fragmentation? …

Yao Shunyu (Tencent): I think I have two major impressions: one is the clear divergence between “to consumer” and “to business” models, and the other is divergence between the path of vertical integration and the path of separating the model and application layers [模型和应用分层].

I’ll start with the first point. I think when people think of AI, the two biggest names are ChatGPT and Claude Code. They are both the canonical examples of “to consumer” versus “to business.” What’s really interesting is if you compare ChatGPT today versus ChatGPT from last year, there really isn’t a difference in feeling. On the other hand, Coding — to exaggerate slightly — has already reshaped how the entire coding industry works. People already don’t write code anymore, they instead talk with their computer in plain English.

The core point is that in respect to the “to consumer” models, the majority of people, the majority of the time, just don’t need to use that strong of AI. Maybe compared to last year today’s ChatGPT is stronger at abstract writing and Galois Theory [abstract mathematics], but most people most of the time can’t feel it. The majority of people, especially in China, use it as an enhanced search engine. Most of the time, they don’t know how to properly use it to elicit its “intelligence.”

But for business-facing models, it’s clear that higher intelligence represents higher productivity, which is more and more valuable. These things are all correlated.

There’s also another obvious point about business-facing models: most of the time, people want to use the strongest model. One model might cost $200 a month, and the second-best or slightly weaker model might be $50 or $20 a month. Today, we find that many Americans are willing to pay a premium for the best model. [Suppose] your salary is $200,000, and you have 10 tasks you have to do daily. A really good model can do eight or nine of those, while the weaker one can [only] do five or six. The problem is when you don’t know which five or six tasks they are, you have to spend extra effort monitoring it.

I think regardless of whether it’s people or models, in the “to business” market we’ve realized a really interesting phenomenon: the divergence between strong models and somewhat weaker models will become more and more pronounced. I think that’s the first observation.

The second observation is about the difference between vertically-integrated models and ones that separate the model and application layers. I think a good example is the difference between ChatGPT Agent and Claude or Gemini with an application-layer product like Manus. In the past, everyone thought that vertically-integrated paths would definitely be better, but at least today that’s not certain. First, the capabilities needed at the model layer versus the application layer are rather different. Especially in the case of business-facing or productivity scenarios, larger pre-training is still a key factor, and that’s really difficult for product companies (产品公司) to do. But if you want to use such a good model well, or if this sort of model has overflow capacity (溢出能力), you still need to do a lot of work on the application or environment side.

We also realize that for consumer-facing applications, vertical integration, whether it’s ChatGPT or Doubao (豆包), still holds; models and products are tightly coupled and iterate together. But for business-facing cases, this trend is almost flipped, as models are getting stronger and better, but there will still be models that do many application-layer things well being applied to different productivity workloads.

Li Guangmi (Moderator): Because Shunyu has a new role, what are you thinking about doing next in the Chinese market? Do you have any distinctive characteristics or keywords? Can you share anything with us right now?

Yao Shunyu (Tencent): I think Tencent is definitely a company with stronger consumer-facing genetics. I think we will think deeply about how we can make today’s large models or AI development give users a greater value. A core consideration is that we realize most of the time, in respect to our environment or stronger models, we need additional context.

腾讯元宝回应争议：使用不会改变内容版权归属 — The logo for Yuanbao, Tencent’s AI app. Source.

Being business-facing in China is truly difficult. The productivity revolution, including many Chinese companies doing coding agents, requires breaking into foreign markets. We will think deeply about how to serve ourselves well first. The difference between a start-up and a big company doing coding [agents] is that the big company already has many kinds of application scenarios, many places where we need to improve productivity. If our models can do well in those areas, not only will these models have their unique advantages, not only will our company develop well, but, importantly, we will be able to capture data from real-world scenarios, which is really interesting. For example, startups like Claude, if they want more Coding Agent data, they need to find data vendors to label that data, they need to use all kinds of software engineers to think about what data they need to label. The thing is there are only a few data vendors in total, they’ve only hired so many people, so in the end they’re limited. But if you are a company with 100,000 people, there might be a few interesting attempts at trying to use real-world data well, rather than relying on data labellers or agreements.

…

Topic 2: The Next Paradigm 下一个范式

Li Guangmi (Moderator): Moving to the second interesting question. Today is a special moment in time [时间点特别特殊]. One reason is that pretraining has gone on for the past three years, and many people say we may now have captured 70-80% of the potential gains [“走到了七八成的收益” this is a fractional metaphor, not a literal statistic — the implication here is that the low hanging fruit has already been picked]. Reinforcement learning has also become a consensus, unlocking perhaps 40-50% of the remaining space, with huge room left in data and environment space. So the question of a new paradigm going forward is especially worth discussing. Professor Tang also mentioned autonomous learning and self-learning. Since the theme of today’s event is “Next” I think this is a topic particularly worth digging into.

Let’s start with Shunyu. You’ve worked at OpenAI, which is at the frontier. How do you think about the next paradigm? OpenAI is a company that has advanced humanity through the first two paradigms. Based on your observations, could you share some thoughts on what a third paradigm might look like?

Yao Shunyu (Tencent): Autonomous learning [自主学习] is a very hot term right now. In Silicon Valley — on every street corner and in every café [大街小巷咖啡馆里面] — people are talking about it, and it’s forming a kind of consensus. From my observations, though, everyone defines and understands it differently. I’ll make two points.

First, this is not really a methodology problem, but a data or task problem. When we talk about autonomous learning, the key question is: in what kind of scenario, and based on what kind of reward function, is it happening? When you’re chatting and the system becomes more and more personalized, that’s a kind of autonomous learning. When you’re writing code and it becomes increasingly familiar with each company’s unique environment or documentation, that’s another kind of autonomous learning. When it explores new science — like a PhD student going from not knowing what organic chemistry is to becoming an expert in the field — that’s also autonomous learning. Each type of autonomous learning involves different challenges and, in a sense, different methodologies.

Second — and I’m not sure if this is a non-consensus view — this is actually already happening. Very obviously, ChatGPT is using user data to continuously bridge the gap [the verb here is “弥合” — literally, “to prompt an open wound to heal” — which implies passivity/emergent behavior rather than active design] in understanding what human conversational styles are like, making it feel increasingly good to interact with. Isn’t that a form of self-learning?

Today, Claude has already written 95% of the code for the Claude project itself. It’s helping to make itself better. Isn’t that also a form of self-learning? Back in 2022 and 2023, when I was in Silicon Valley promoting this work, the very first slide I used said that the most important aspect of ASI was autonomous learning. Today’s AI systems essentially have two parts. First, there’s the model itself. Second, there’s a codebase. How you use the model — whether for reasoning or as an agent — depends on the corresponding codebase. If we look at the Claude system today, it essentially consists of two parts: one is a large amount of code related to the deployment environment, and the other is a large amount of code that governs how the system is used — whether that’s GPU-related, frontend-related, or environment-related. I think Claude Code is already doing this at scale today, though people may not fully realize it. These examples of autonomous learning are still confined to very specific scenarios, so they don’t yet feel overwhelmingly powerful.

This is already happening, but there are still efficiency constraints and other limitations — many different issues. Personally, I see this more as a gradual change rather than a sudden leap [更像是一个渐变，不是突变].

Li Guangmi (Moderator): Let me follow up on that. Some people are relatively optimistic about autonomous learning and think we might see signals as early as 2026. In your view, what practical problems still need to be solved before we see those signals? For example, long context, parallel model sampling, or other factors — what key conditions still need to fall into place before these signals really emerge?

Yao Shunyu (Tencent): A lot of people say we’ll see signals in 2026, but I think we’ll see them in 2025. Take Cursor, for example: every few hours they retrain using the latest user data, including new models, and they’re already using real-world environment data to train. People might feel this isn’t yet a “shock to the system” simply because they don’t have pretraining capabilities, and their models are indeed not as strong as OpenAI’s. But clearly, this is already a signal.

The biggest issue is imagination. It’s relatively easy for us to imagine what a reinforcement learning or reasoning paradigm might look like once it’s implemented. We can imagine something like o1: originally scoring 10 points on math problems, then jumping to 80 points thanks to reinforcement learning and very strong chains of thought. But if in 2026 or 2027 a new paradigm emerges — if I announce that a new model or system has achieved self-learning — what kind of task should we use to evaluate it? What kind of performance should it have for you to believe it’s real? Is it a profitable trading system that makes a lot of money? Does it genuinely solve scientific problems that humans previously couldn’t? Or something else entirely? I think we first need to imagine what it would actually look like.

Li Guangmi (Moderator): Shunyu, OpenAI has already driven two paradigm shifts. If a new paradigm emerges in 2027, which company globally do you think has the highest probability of leading that paradigm innovation — if you had to name just one?

Yao Shunyu (Tencent): Probably still OpenAI. Although commercialization and various other changes have weakened its innovative DNA to some extent, I still think it’s the place most likely to give birth to a new paradigm [最有可能诞生新范式的地方].

…

Li Guangmi (Moderator): Junyang just mentioned initiative, including personalization. Do you think that if we really achieve memory, we’ll see a breakthrough-level technological leap by 2026?

Lin Junyang (Qwen): My personal view is that many so-called “breakthroughs” in technology are really issues of observation. Technologically, things are developing in a linear way; it’s just that humans experience them very intensely. Even the emergence of ChatGPT, for those of us working on large models, was linear growth. Right now everyone is working on “memory.” Is this technology right or wrong? Many solutions aren’t inherently right or wrong, but the results, at least in our own experience, are often disappointing [the word used here is 献丑, a self-depreciating term meaning “to present ugliness; to put one’s own artistic incompetence on display.” You might use this term to describe your poor karaoke abilities.] — our memory knows what I’ve done in the past, but it’s really just recalling past events. Calling my name every time doesn’t actually make you seem very smart. The question is whether memory can reach some critical point where, combined with memory, it becomes like a person in real life. People used to say this about movies — that moment when it really feels human. Understanding memory might be that moment, when human perception suddenly bursts forth [人类的感受突然间迸发].

I think it will still take at least a year. Technology often doesn’t move that fast. Everyone feels very “involuted,” [比较卷] with something new every day, but technologically it’s still linear growth. It’s just that from an observational perspective, we’re in an exponential-feeling phase. For example, a small improvement in coding ability can generate a lot of productive value, so people feel AI is advancing very fast. From a technical standpoint, we’re just doing a bit more work. Every day when we look at what we’re building, it feels pretty crude [“挺土的” — literally, “quite rustic/earthy”] — those bugs are honestly embarrassing to talk about. But if we can achieve these results in this way, I think in the future, with better integration of algorithms and infrastructure, there may be much more potential.

A Chinese tech company’s office. Source.

Li Guangmi (Moderator): Let’s call on Professor Yang Qiang.

Yang Qiang (HKUST): I’ve always worked on federated learning. The core idea of federated learning is collaboration among multiple centers. What I’m seeing more and more now is that many scenarios lack sufficient local resources, yet local data comes with strong privacy and security requirements. So as large models become more powerful, we can imagine collaboration between general-purpose large models and locally specialized small models or domain-expert models. I think this kind of collaboration is becoming increasingly possible.

Take Zoom in the United States — Huang Xuandong and his team built an AI system with a large foundational base. Everyone can plug into this base, and in a decentralized state it can both protect privacy and communicate and collaborate effectively with general large models.

I think this open-source model is especially good: open sourcing knowledge, open sourcing code, and open sourcing at the model level.

In particular, in fields like healthcare and finance, I think we’ll see more and more of this phenomenon.

Tang Jie (Zhipu): I’m very confident that this year we’ll see major paradigm innovations. I won’t go into too much detail, but as I mentioned earlier — continual learning, memory, even multimodality — I think all of these could see new paradigm shifts.

There’s also a new trend I want to talk about: why would such a paradigm emerge? In the past, industry ran far ahead of academia. I remember going back to Tsinghua last year and the year before, talking with many professors about whether they could work on large models. The first issue wasn’t just a lack of GPUs — it was that the number of GPUs was almost zero. Industry had ten thousand GPUs; universities had zero or one. That’s a ten-thousand-fold difference. But now, many universities have a lot of GPUs, and many professors have begun doing large-model research. In Silicon Valley too, many professors are starting to work on model architectures and continual learning. We used to think industry dominated everything, but by late 2025 to early 2026, that gap won’t really exist anymore. Maybe there’s still a tenfold difference, but the seeds have been planted [孵化出种子]. Academia has the genes for innovation and the potential — this is the first point.

Second, innovation always emerges when there is massive investment in something and efficiency becomes a bottleneck. In large models, investment is already enormous, but efficiency isn’t high. If we keep scaling, there will still be gains — early 2025 maybe data went from 10 TB to 30 TB, and maybe we can scale to 100 TB. But once you scale to 100 TB, how much benefit do you get, and at what computational cost? That becomes the question. Without innovation, you might spend one or two billion and get very little return, which isn’t worth it.

On the other hand, for new intelligence innovations, if every time we have to retrain a foundation model and then retrain lots of reinforcement learning — when RL came out in 2024, many people felt continuing training had returns. But today, continuing aggressive RL still has returns, but not that much. It’s an efficiency-of-returns problem. Maybe in the future we need to define two things: one is that if we want to scale up, the dumbest way is just scaling — scaling does bring gains and raises the upper bound of intelligence. The second is defining “intelligence efficiency”: how efficiently we gain intelligence, how much incremental intelligence we get per unit of investment. If we can get the same intelligence gains with less input, especially when we’re at a bottleneck, then that becomes a critical breakthrough.

So I believe that in 2026, such a paradigm will definitely emerge. We’re working hard and hope it happens to us, but it might not.

Li Guangmi (Moderator): Like Professor Tang, I’m also very optimistic. For every leading model company, compute grows by about tenfold each year. With more compute and more talent flowing in, people have more GPUs, run more experiments, and it’s possible that some experimental engineering effort, some key point, will suddenly break through.

Topic Three: Agent Strategy

Li Guangmi (Moderator): Professor Tang just talked about how to measure intelligence. The third topic is Agent strategy. Recently I’ve talked with many researchers, and there’s another big expectation for 2026. Today, agents can reason in the background for 3–5 hours and do the equivalent of one to two days of human work. People expect that by 2026, agents could do one to two weeks of normal human work. This would be a huge change — it’s no longer just chat, but truly automating a full day or even a full week of workflows. 2026 may be a key year for agents to create economic value.

On the agent question, let’s open it up for discussion. Shunyu mentioned vertical integration earlier — having both models and agent products. We’ve seen several Silicon Valley companies doing end-to-end work from models to agents. Shunyu has spent a lot of time researching agents. From the perspective of 2026 — long agents really doing one to two weeks of human work — and from the standpoint of agent strategy and model companies, how do you think about this?

Yao Shunyu (Tencent): I think, as mentioned earlier, To B and To C are quite different. Right now, the To B side seems to be on a continuously rising curve, with no sign of slowing down.

What’s interesting is that there isn’t much radical innovation involved. It’s more about steadily making models larger through pretraining, and diligently doing post-training on real-world tasks. As long as pretraining keeps scaling up and post-training keeps grounding models in real tasks, they’ll get smarter and generate more value.

In a sense, for To B, all goals are more aligned: the higher the model’s intelligence, the more tasks it can solve; the more tasks it solves, the greater the returns in To B scenarios.

…

Also, I think education is extremely important. From what I observe, the gap between people today is enormous. More often than not, it’s not that AI is replacing human jobs; rather, people who know how to use these tools are replacing those who don’t. It’s like when computers first emerged — if you turned around and learned programming while someone else kept using a slide rule, the gap between you would be massive.

Today, the most meaningful thing China can do is to improve education — teaching people how to better use products like Claude or ChatGPT. Of course, Claude may not be accessible in China, but we can use domestic models like Kimi or Zhipu instead.

Li Guangmi (Moderator): Thank you, Shunyu. Next, we’d like Junyang to share his thoughts on agents. Qwen also has an ecosystem — Qwen builds its own agents and also supports a broader agent ecosystem. You can expand on that as well.

Lin Junyang (Qwen): This may touch on questions of product philosophy. Manus is indeed very successful, and whether “wrapper apps” [套壳] are the future is itself an interesting topic. At this stage, I actually agree with your view — that the model is the product [模型即产品]. When I talk with people at DeepMind, they call what they do “research,” and I really like that framing. From my perspective on OpenAI as well, there are many cases where research itself can become a product—researchers can effectively act as product managers and build things directly. Even internally, our own research teams can work on things that face the real world.

I’m willing to believe that the next generation of agents can do what we just discussed, and that this is closely tied to the idea of proactive or self-directed learning. If an agent is going to work for a long time, it has to evolve during that process. It also has to decide what to do, because the instructions it receives are very general tasks. Our agents have now become more like hosted or delegated agents, rather than something that requires constant back-and-forth iteration [来来回回交互].

From this perspective, the requirements on the model are very high. The model is the agent, and the agent is the product. If they are fully integrated, then building a foundation model is essentially the same as building a product. Seen this way, as long as you keep pushing up the upper bound of model capability — through scaling, for example — this vision is achievable.

Another important point is interaction with the environment. Right now, the environments we interact with aren’t very complex — they’re mostly computer-based environments. I have friends working on AI for Science. Take AlphaFold, for example: even if you achieve impressive results, it still hasn’t reached the stage where it can directly transform drug development. Even with today’s AI, it doesn’t necessarily help that much, because you still need to run experiments and perform physical processes to get feedback.

So the question is: could AI environments in the future become as complex as the real human world—where AI directs robots to run experiments and dramatically increase efficiency? Human efficiency today is extremely low. We still have to hire lots of outsourced labor to conduct experiments in lab environments. If we can reach that point, then that’s the kind of long-horizon work I imagine agents doing — not just writing files on a computer. Some of this could happen quite quickly this year, and over the next three to five years, it will become even more interesting. This will likely need to be combined with embodied intelligence.

Li Guangmi (Moderator): I want to follow up with a sharper question. From your perspective, is the opportunity for building general-purpose agents something for startups, or is it simply a matter of time before model companies inevitably build great general agents themselves?

Lin Junyang (Qwen): Just because I work on foundation models doesn’t mean I should act as a startup mentor — I won’t do that. I can only borrow a line from successful people: the most interesting thing about building general agents is that the long tail is actually where the real value lies. In fact, the greatest appeal of AI today is in the long tail.

If it were just a Matthew effect [that is, a “winners keep winning” dynamic], the head of the distribution [“头部” that is, high-frequency use cases] would be easy to solve. Back when we worked on recommendation systems, we saw how concentrated recommendations were — everything was at the head. We wanted to push items from the tail, but that was extremely difficult. As someone working in multimodality who tried to tackle the Matthew effect in recommendation systems, I was basically sprinting down a dead end [奔着死路去的].

What people now call AGI is really about solving this problem. When you build a general agent, can you solve long-tail problems? A user has a problem that they’ve searched for everywhere and simply cannot find anyone who can help — but at that moment, the AI can solve it. No matter where you look in the world, there’s no solution, yet the AI can help you. That’s the greatest charm of AI [这就是AI最大的魅力].

So should you build a general agent? I think it depends [“见仁见智” means something like, “reasonable people can disagree about this”]. If you’re exceptionally good at building wrapper applications and can do it better than model companies, then go for it. But if you don’t have that confidence, this may ultimately be left to model companies pursuing “model-as-product.” When they encounter a problem, they can just retrain the model or throw more compute at it [“烧卡” — literally, “to burn GPUs”], and the problem may be solved. So ultimately, it depends on the person.

Tang Jie (Zhipu): I think there are several considerations that determine the future trajectory of Agents.

First, does the Agent itself actually solve human problems, and are those problems valuable? How valuable? For example, when GPT first came out, many early Agents were built. But you later discovered that those Agents were extremely simple, and in the end a prompt alone could solve the problem. At that point, most Agents gradually died off. So the first issue is whether the problem an Agent solves is valuable and whether it actually helps people.

Second, how expensive is doing this? If the cost is extremely high, that’s also a problem. As Junyang just mentioned, perhaps calling an API can already solve the problem. But on the flip side, if calling an API can solve it, then when the API provider realizes the problem is very valuable, they might simply build it into the base model themselves. This is a contradiction — a very deep contradiction. The base model layer and the application layer are always in tension.

Finally, there’s the speed of application development. Suppose I have a six-month window and can quickly meet a real application need. Then, six months later, whether you can iterate, how you follow up, and how you keep moving forward all become critical.

Large models today are more oriented towards competing on speed and timing. Maybe our code is correct, maybe that lets us go a bit further — but if we fail, half a year may just be gone. This year we’ve only done a little in coding and Agents, but our coding API call volume is already quite good. I think this points to a new direction, just as working on Agents in the future is also a direction.

Li Guangmi (Moderator): Thank you. In the past, model companies had to chase after general capabilities, so they may not have put as much priority into exploration. After general capabilities catch up, we increasingly expect that by 2026, Zhipu and Qwen will have their own “Claude moments” and “memory moments.” I think that’s worth anticipating.

Topic Four: The Future of Chinese AI

Li Guangmi (Moderator): The fourth question and final question is quite interesting. Given the timing of this event, we need to look ahead. I’d like to ask everyone: three to five years from now, what is the probability that the world’s most advanced AI company will be a Chinese team? What key conditions are required for us to move from being followers today to leaders in the future? In short, over the next 3–5 years, what is the probability, and what key conditions still need to be fulfilled?

You’ve experienced both Silicon Valley and China — what is your judgment on the probability and on the key conditions?

File:Mountain View CA 13.jpg — Downtown Mountain View, California. Source.

Yao Shunyu (Tencent): I think the probability is actually quite high. I’m fairly optimistic. Right now, whenever something is discovered, China can replicate it very quickly and often does better in specific areas. This has happened repeatedly in manufacturing and electric vehicles.

So, I think there are several key points. One is whether China can break through on lithography machines. If compute ultimately becomes the bottleneck, can we solve the compute problem? At the moment, we have strong advantages in electricity and infrastructure. The main bottlenecks are production capacity — especially lithography — and the software ecosystem. If these are solved, it would be a huge help.

Another question is whether, beyond the consumer side, China can develop a more mature and robust To-B market — or whether Chinese companies can really compete in international commercial environments. Today, many productivity-oriented or enterprise-focused models and applications are still born in the U.S., largely because willingness to pay is higher and the business culture is more supportive. Doing this purely within China is very difficult, so many teams choose to go overseas or pursue international markets. These are two major structural constraints.

More important are subjective factors. Recently, when talking with many people, our shared feeling is that China has an enormous number of very strong talents. Once something is proven doable, many people enthusiastically try it and want to do it even better.

What China may still lack is enough people willing to break new paradigms or take very risky bets. This is due to the economic environment, business environment, and culture. If we could increase the number of people with entrepreneurial or risk-taking spirit — people who truly want to do frontier exploration or paradigm-shifting work — that would help a lot. Right now, once a paradigm emerges, we can use very few GPUs and very high efficiency to do better locally. Whether we can lead a new paradigm may be the core issue China still needs to solve, because in almost everything else — business, industrial design, engineering — we are already, in some respects, doing better than the U.S.

Li Guangmi (Moderator): Let me follow up with Shunyu on one question. Do you have anything you’d like to bring to attention regarding research culture in Chinese labs? You’ve experienced OpenAI and also DeepMind in the Bay Area. What differences do you see between Chinese and U.S. research cultures, and how do these research cultures fundamentally affect AI-native companies? Do you have any observations or suggestions?

Yao Shunyu (Tencent): I think research culture varies a lot from place to place. The differences among the U.S. labs may actually be larger than the differences between Chinese and U.S. labs, and the same is true within China.

Personally, I think there are two main points. One is that in China, people still prefer to work on safer problems. For example, pretraining has already been proven to be doable. It’s actually very hard and involves many technical challenges, but once it’s proven doable, we’re confident that within a few months or some period of time, we can basically figure it out. But if today you ask someone to explore long-term memory or continual learning, people don’t know how to do it or whether it can even be done, which is still a tough situation.

This is not only about preferring certainty over innovation. A very important factor is the accumulation of culture and shared understanding, which takes time. OpenAI started working on these things in 2022, while domestic efforts began in 2023, so there are differences in understanding. The gap may not actually be that large — much of it may simply be a matter of time. When cultural depth and foundational understanding accumulate, they subtly influence how people work, but this influence is very hard to capture through rankings or leaderboards.

China tends to place a lot of weight on leaderboard rankings and numerical metrics. One thing DeepSeek has done particularly well is caring less about benchmark scores and more about two questions: first, what is actually the right thing to do; and second, what feels genuinely good or bad in real use. That’s interesting, because if you look at Claude, it may not rank highest on programming or software-engineering leaderboards, yet everyone knows it’s one of the most usable models. I think we need to move beyond the constraints of leaderboards and stick with processes we believe are truly correct.

Li Guangmi (Moderator): Thank you, Shunyu. Let’s now ask Junyang to talk about probability and challenges.

Lin Junyang (Qwen): This is a dangerous question. In theory, at an occasion like this, you’re not supposed to pour cold water over everything. But if we talk in terms of probability, I want to share some differences I’ve felt between China and the U.S.

For example, U.S. compute may overall exceed ours by one to two orders of magnitude. What I see is that whether it’s OpenAI or others, a huge amount of their compute is invested into next-generation research. For us, by contrast, we’re relatively constrained — just fulfilling delivery requirements already consumes the vast majority of our compute. This is a major difference.

Perhaps this is a long-standing question throughout history: is innovation spurred by the hands of hand of the rich or the poor? The poor are not without opportunities. We sometimes feel that the rich waste GPUs, training many things that turn out not to be useful. But when you’re poor, things like algorithm-infrastructure co-optimization become necessary. If you’re very rich, there’s little incentive to do that.

Going one step further, as Shunyu mentioned with lithography machines, there may be another opportunity in the future. From a hardware-software co-design perspective, is it possible to truly build something new? For example, could the next-generation model and chip be designed together?

In 2021, when I was working on large models, Alibaba’s chip team came to me and asked whether I could predict whether three years later the model would still be a Transformer, and whether it would still be multimodal. Why three years? Because they needed three years to roll out a chip. At the time, my answer was: I don’t even know whether I’ll still be at Alibaba in three years! But today I’m still at Alibaba, and indeed it’s still Transformers and still multimodal. I deeply regret that I didn’t push them harder back then.

At that time, our communication was completely misaligned. He explained many things to me that I couldn’t understand at all; when I explained things to him, he also didn’t understand what we were doing. So we missed this opportunity. Could such an opportunity come again? Even though we’re a group of “poor people,” perhaps poverty forces change. Might innovation happen here?

Today, education is improving. I’m from the earlier 1990s generation, Shunyu is from the later 1990s, and we have many post-2000s in our team. I feel that people’s willingness to take risks is getting stronger and stronger. Americans naturally have a very strong risk-taking spirit. A classic example is early electric vehicles — despite leaking roofs and even fatal accidents, many wealthy people were still willing to invest. In China, I believe wealthy people would not do this; they prefer safe things. But today, people’s risk-taking spirit is improving, and as China’s business environment improves, innovation may emerge. The probability isn’t very large, but it is real.

Li Guangmi (Moderator): If you had to give a number?

Lin Junyang (Qwen): You mean a percentage?

Li Guangmi (Moderator): Yes. Three to five years from now, what’s the probability that the leading AI company will be a Chinese one?

Lin Junyang (Qwen): I think it’s 20%. Twenty percent is already very optimistic, because there are truly many historical factors at play here.

Li Guangmi (Moderator): Thank you, Junyang. Let’s invite Professor Yang. You’ve experienced many AI cycles and seen many Chinese AI companies become the strongest in the world. What is your judgment on this question?

Yang Qiang (HKUST): We can look back at how the internet developed. It also began in the United States, but China quickly caught up, and applications like WeChat became world-leading. I see AI as a technology rather than a finished end product. China has many talented people who can push this technology to its limits, whether in consumer or enterprise applications. Personally, I’m more optimistic about the consumer side, because it allows for many different ideas to flourish and for collective creativity to emerge. Enterprise applications may face some constraints—such as willingness to pay and corporate culture—but these factors are also evolving.

I’ve also recently been observing business trends and discussing them with some business school classmates. For example, there’s a U.S. company called Palantir. One of its ideas is that no matter what stage AI development is at, it can always find useful things within AI to apply to enterprises. There will inevitably be a gap, and they aim to bridge that gap. They use a method called ontology. I looked into it, and its core idea is similar to what we previously did with transfer learning — taking a general solution and applying it to a specific practice, using an ontology to transfer knowledge. This method is very clever. Of course, it’s implemented through an engineering approach, sometimes referred to as front-end engineering (FDE).

In any case, I think this is something very much worth learning from. I believe Chinese enterprises — especially AI-native companies — should develop such To B solutions, and I believe they will. So I think To C will definitely see a hundred flowers bloom, and To B will also quickly catch up.

Li Guangmi (Moderator): Thank you, Professor Yang. Let’s bring in Professor Tang.

Tang Jie (Zhipu): First, I think we do have to acknowledge that between China and the U.S., there is indeed a gap in research, especially in enterprise AI labs. That’s the first point.

But I think looking to the future, China is gradually getting better, especially the post-90s and post-2000s generations, who are far better than previous generations. Once, at a conference, I joked that our generation is the unluckiest: the previous generation is still working, we’re also still working, so we haven’t had our moment yet — and unfortunately, the next generation has already arrived, and the world has been handed over to them, skipping our generation entirely. That was a joke.

China may have the following opportunities.

First, there is now a group of smart people who truly dare to do very risky things. I think they exist now — among the post-2000s and post-90s generations — including Junyang, Kimi, and Shunyu, who are all very willing to take risks to do these things.

Second, the overall environment may be improving. This includes the broader national context, competition between large and small firms, challenges facing startups, and the business environment more generally. As Junyang mentioned earlier, he’s still tied up with delivery work. If we can further improve the environment so that smart, risk-taking people have more time to focus on real innovation—giving people like Junyang more space to do creative work—this is something the government and the country may be able to help with.

Third, it comes back to each of us personally: can we push through? Are we willing to stay on one path, dare to act, dare to take risks, and keep going even if the environment isn’t perfect? I think the environment will never be the best. But we are actually fortunate — we’re living through a period where the environment is gradually improving. We are participants in that process, and perhaps we’ll be the ones who gain the most from it. If we stubbornly persist, maybe the ones who make it to the end will be us.

Thank you, everyone.

Li Guangmi (Moderator): Thank you, Professor Tang. We also want to call on more resources and capital to be invested into China’s AGI industry — more compute, so that more young AI researchers can use GPUs, maybe for three to five years. It’s possible that in three to five years, China will have three to five of its own Ilyas [Ilysa Sutskever]. That’s what we’re really looking forward to.

Thank you all very much!

AGI-Next: Outlook
Speaker: Zhang Bo 张钹 (Academician of the Chinese Academy of Sciences, Professor at Tsinghua University)

What is our goal?
In the past, artificial intelligence was simply a tool. Today, we are in a deeply contradictory situation: on the one hand, we want AI to take on more and more complex tasks; on the other, we fear that it may surpass us and become a new kind of subject in its own right. This creates widespread anxiety. In the past, we only had one subject—humanity—and even that was difficult to manage, because humanity is plural rather than singular: each subject has different demands. If non-human subjects emerge, what should we do? How should we coexist with artificial intelligence? And how should we address these concerns?

In fact, future subjects can be divided into three levels.

First, functional or action-oriented subjects.

This is a stage we have already reached — and one we actively welcome— because it can be genuinely helpful to us.

Second, normative or responsibility subjects.
We have not yet reached this stage. One of the greatest difficulties is how to make machines capable of bearing responsibility. This is something we hope to achieve, but from the current situation, it is quite difficult — the technical challenges are very high. But I believe everyone will continue striving toward this.

Third, experiential–conscious subjects.
This is what people fear most. Once machines have consciousness, what should humans do?

If we are people actually running companies, we may not need to think that far ahead — we can focus on the first and second levels. But there are two issues that must be considered: alignment and governance.

The question of alignment has been discussed a lot. Must machines align with humans? This is a question worth discussing. Humans do not only have virtues; humans are also greedy and deceptive — machines originally had none of these traits. If machines align with humans, are humans already the highest standard? Clearly not.

As for governance, I believe the most important governance is not the governance of machines, but the governance of humans — namely, researchers and users.

This involves the responsibilities that enterprises and entrepreneurs in the AI era should bear.

Before large language models appeared, I strongly opposed my students starting businesses. Some students’ parents even agreed with me. But after large language models, I believe the most outstanding students should start businesses because artificial intelligence has redefined what it means to be an entrepreneur. As I mentioned earlier, artificial intelligence will define everything, and it will also define the entrepreneurs of the future.

In the future, entrepreneurs will need to take on six kinds of responsibilities. Let me briefly talk about one of them: redefining how value is created. Artificial intelligence is not simply about delivering products or services. Instead, it transforms knowledge, ethics, and applications into reusable tools that can benefit humanity. This represents a fundamental shift. AI should be treated as a general-purpose technology—like water or electricity—made broadly available to society. That places very high demands on entrepreneurs. Beyond building companies, they must also take responsibility for governance and for advancing inclusive and sustainable growth.

Therefore, entrepreneurs in the AI era carry many new missions. And it is precisely these missions that make entrepreneurship — and entrepreneurs themselves — honorable, even sacred, professions.

Thank you, everyone.