An anon start-up conducting cutting-edge open-source research on China’s science, technology, and industrial ecosystems is looking for part-time China research analysts. You’ll be saving America with a firm run by someone I [Jordan] can vouch for being the literal best in the business. Apply here.
Today’s post is brought to you by 80,000 Hours, a nonprofit that helps people find fulfilling careers that do good. 80,000 Hours — named for the average length of a career — has been doing in-depth research on AI issues for over a decade, producing reports on how the US and China can manage existential risk, scenarios for potential AI catastrophe, and examining the concrete steps you can take to help ensure AI development goes well.
Their research suggests that working to reduce risks from advanced AI could be one of the most impactful ways to make a positive difference in the world. They provide free resources to help you contribute, including:
A job board with hundreds of high-impact opportunities,
A podcast featuring deep conversations with experts like Carl Shulman and Ajeya Cotra.
Free, one-on-one career advising to help you find your path.
To learn more and access their research-backed career guides, visit 80000hours.org/ChinaTalk.
To read their report about AI coordination between the US and China, visit http://80000hours.org/chinatalkcoord.
A big welcome to , who joined ChinaTalk to cover AI full time this week after three years of nights and weekends contribution!
Beijing-based Moonshot AI (月之暗面, literally “dark side of the moon” - a Pink Floyd reference) released Kimi K2 on July 11. K2 is a non-reasoning, open source large language model based on the Mixture-of-Experts (MoE) technique and achieved benchmark scores competitive with many leading models, including DeepSeek V3. At 1 trillion parameters, it is an impressive feat. Per Nathan Lambert of Interconnects:
It is a "non-thinking" model with leading performance numbers in coding and related agentic tasks (earning it many comparisons to Claude 3.5 Sonnet), which means it doesn't generate a long reasoning chain before answering, but it was still trained extensively with reinforcement learning. It clearly outperforms DeepSeek V3 on a variety of benchmarks, including SWE-Bench, LiveCodeBench, AIME, or GPQA, and comes with a base model released as well. It is the new best-available open model by a clear margin.
ChinaTalk last covered Moonshot AI in March, when we translated an expansive interview CEO Yang Zhilin (杨植麟) gave to the online tech news platform Overseas Unicorn (海外独角兽). In the conversation, Yang portrayed himself and his company as stubborn AGI purists who focus on “tech visions” rather than product design or short-term revenue generation. K2 is a step towards many aspects of this vision, but its story so far also reflects the jagged reality of cutting-edge model research in China. In this piece we discuss:
What distinguishes Moonshot in China’s landscape — and what the DeepSeek and Kimi moments should tell Westerners about the future of Chinese AI labs;
How DeepSeek paved the way for K2, and why this is about open-source culture;
Why “the model is the agent” for Kimi;
And what we might expect next from Chinese AI startups.

Shooting for the Moon
Yang Zhilin, born in coastal Guangdong in 1992, earned his bachelor’s degree from Tsinghua University and went on to a PhD at Carnegie Mellon. He worked at Meta AI and Google Brain before returning to China to begin his entrepreneurship journey. Unlike fellow Guangdong native Liang Wenfeng (梁文锋), CEO of DeepSeek, Yang has deep connections in both China and the US and does not only focus on hiring domestically educated talent. While Tsinghua is heavily represented in the résumés of Moonshot’s founding team, others come from more diverse global educational backgrounds.
Moonshot has no B2B offerings and does not build wrapper tools for corporate users, instead focusing directly on individual customers. From the beginning, Kimi’s selling point to Chinese users was its long context window, allowing users to upload dozens of documents and analyze long articles. But it’s not just about an awesome user experience; long-context is central to Yang Zhilin’s AI worldview. Per his comments in the Overseas Unicorn interview:
To achieve AGI, long-context will be a crucial factor. Every problem is essentially a long-context problem — the evolution of architectures throughout history has fundamentally been about increasing effective context length. Recently, word2vec won the NeurIPS Test of Time award. Ten years ago, it predicted surrounding words using only a single word, meaning its context length was about 5. RNNs extended the effective context length to about 20, LSTMs increased it to several dozen, and transformers pushed it to several thousand. Now, we can reach hundreds of thousands.
Yang’s previous venture Recurrent AI (循环智能) was funded by seven venture capital firms, two of which also invested in Moonshot. Alibaba became Moonshot’s biggest backer in 2024. As Moonshot’s valuation rose rapidly, five of Recurrent AI’s investors — those who did not join Moonshot’s funding rounds — filed an arbitration case against Yang, alleging that Moonshot was launched without obtaining necessary waivers from previous investors. Recall that High Flyer, the parent company of DeepSeek, is a hedge fund, and that Liang Wenfeng has rejected outside investment as of March. Moonshot operates under much more normal tech startup restraints and faces investor pressure. Still, with just $1 billion raised (much less than the likes of Anthropic and OpenAI) and pressure to ultimately deliver value to shareholders, it created a leading open model, rather than operating in the hedge fund-funded cocoon that DeepSeek researchers enjoy.
What we are starting to observe here is rather obvious in hindsight, as we move farther away from the DeepSeek moment: there is no single path to success for Chinese frontier labs. One does not necessarily have to replicate the DeepSeek recipe, whether in terms of hiring, funding, or labor practices, to create world-class models. Compute constraints, which apply across China, continue to incentivize a diverse range of research teams in China to pursue novel algorithmic research.
From DeepSeek V3 to K2
The team behind Kimi is very active on Zhihu (知乎), China’s Quora equivalent. According to a post by engineer Liu Shaowei (刘少伟), K2 essentially copied the combination of Expert Parallelism (EP) and Data Parallelism (DP) outlined by DeepSeek in V3’s technical report, with four notable changes:
Raising the number of experts from 256 to 384, as their pretraining team found that scaling laws are valid for sparsity;
Reducing the number of attention heads to compensate for a higher number of experts;
Keeping only the first layer as dense and using MoE for all the rest, to maximize the benefits of MoE;
Keeping all experts in one group.

Recall Yang Zhilin’s Overseas Unicorn interview, where he argued that “AI is essentially a pile of scaling laws laid on top of each other”. Raising the number of experts seems to reflect that. As for why they used DeepSeek’s architecture, Liu says there was no point reinventing the wheel:
Before starting to train K2, we conducted a large number of scaling experiments related to model architecture. The result was that none of the proposed architectures at the time were truly able to outperform DeepSeek V3. … The reason is simple: the V3 architecture has been validated and remains effective at large scale, whereas our “new architectures” haven’t yet undergone sufficient large-scale validation. Given the presence of two massive variables — Muon optimizer and a much larger model size — we didn’t want to introduce additional unproven variables just for the sake of “being new.”
Another Zhihu comment by fellow Moonshot engineer Su Jianlin (苏剑林) highlight other ways K2 learned from DeepSeek (italics added by editor):
Internally, we were also constantly exploring better alternatives to MLA [multi-latent attention, an architectural idea refined and scaled by DeepSeek], but since this was our first open-source large-scale model, we ultimately chose to pay tribute to DeepSeek by replicating its MLA design.
As for the MoE (Mixture of Experts) component, we adopted DeepSeek-V3’s shared expert, high sparsity, and loss-free load balancing. … A special note on the Sparsity part: for quite some time, we were running experiments with Sparsity = 8 (i.e., selecting 8 out of 64 experts). It wasn’t until we resolved some infrastructure issues one day that we began trying higher sparsity levels and found the gains to be significant. So we started exploring the Sparsity Scaling Law and gradually leaned into configurations similar to DeepSeek’s projections (DeepSeek-V3 already uses 256 choose 8, 32 Sparsity; while K2 uses 384 choose 8, 48 Sparsity). It felt like fulfilling a prophecy that the DeepSeek team had already made.
K2 would not exist without DeepSeek — and without an open-source culture and free flow of research. While Moonshot and DeepSeek are certainly competitors at the corporate level, Kimi’s engineers express deep respect for their DeepSeek colleagues. Yang Zhilin, in February 2024, had told Tencent News that he didn’t believe open source models could catch up to closed source any time soon, because “many open-source contributions may not have been validated through compute-intensive testing,” while closed-source projects “attract concentrated talent and capital.” Moreover, he remarked that if he had a leading model today, “open-sourcing it would most likely be unreasonable … it's usually the laggards who might do that—or they might open-source a small model just to stir things up.” So what changed between then and now? The success of DeepSeek was probably an important proof-of-concept for open source for fellow Chinese AI entrepreneurs. While Yang himself has yet to offer any updated comments on his open-source views, Moonshot engineer Justin Wong shared his “why open source” take on Zhihu:
First of all, we obviously wanted to gain some recognition. If K2 were just a closed-source service, it wouldn’t be getting nearly as much attention and discussion as it is now. …
Next, open-sourcing allows us to leverage the power of the developer community to improve the technical ecosystem. Within 24 hours of our release, the community had already implemented K2 in MLX, with 4-bit quantization and more—things we truly don’t have the manpower to accomplish ourselves at this stage.
But more importantly: open-sourcing means holding ourselves to a higher technical standard, which in turn pushes us to build better models—aligned with our goal of AGI.
This might seem counterintuitive—if we’re just releasing model weights, why would that force the model to progress?
The logic is actually very simple: open source means performance comes first. You can no longer rely on superficial tricks or hacks to dazzle users. Anyone who gets the same weights should be able to easily reproduce your performance—only then is it truly valid.
With the success of DeepSeek, Chinese frontier labs now have ample, proven justification for the value of open source, both for marketing and in terms of research relevance.
What Makes K2 Agentic?
Moonshot calls K2 “open agentic intelligence”. It’s not just an OpenAI pun; the K2 team is trying to make their mark by refining definitions of agentic capabilities. Justin Wong frames the importance of agents as such:
A while ago, when agent products were all the rage, I saw a lot of comments saying that Kimi should stop competing in the arena of large models and pivot to making agent products. I just want to say that most such agent products are nothing without Claude. Windsurf being cut off from Claude further proved my point.
Fellow Moonshot engineer and Zhihu user Flood Sung created the following diagram to visualize K2’s agent vision:
Sung explains:
We first let the model generate hundreds of scenarios such as food delivery, Weibo, WeChat... Then based on the scenarios, we generate thousands of tools, such as ordering food delivery, sending Weibo posts, [and] finding contacts. Then based on different tool combinations, we generate hundreds of thousands of different agents (different system prompts + different tool sets). Next, we design tasks ranging from simple to complex with scoring points for each different agent. Then we begin large-scale agent simulation: first assign tasks to a simulated user, let the user interact with the corresponding agent based on the task, and the user is also generated with different personalities and language styles. Next, the tools called by the agent are also simulated through a tool simulator, which is actually equivalent to a world model to execute and return feedback to the agent for continued processing. After executing the entire task, we use the scoring points from the judgment criteria to evaluate whether the agent's interactions meet the requirements. Good data is then stored!
In other words, K2 was trained on massive amounts of synthetic data, generated specifically to resemble real-life applications and optimized for tool-calling.
What Next?
On the technical front, the Kimi team wants to expand on their current vision for agentic capabilities and build “large scale agentic RL.” Of course, they also want to move out of the shadow of DeepSeek’s infrastructure and eventually pioneer their own scalable approaches. We might expect a reasoning model based on K2 in the near future, as well as future foundational models.
Reading through Yang Zhilin and Moonshot engineers’ comments, one might get the impression that they are dismissive of wrappers and applications. But that’s not the whole truth. To understand their approach to real-life application, one has to take into account both the company’s AGI-centric worldview and their theory of change. Yang Zhilin, in his Overseas Unicorn interview, said that the secret ingredient to transformative AI was trust:
I think the ultimate measure of an AI product’s long-term value is how much personalized information users are willing to input into it, and then lossless long-context and personalization are responsible for turning these inputs into valuable outputs. New hardware forms may also be needed — but I think models and software are still bottlenecks. To dig deeper, the prerequisite for users to input a lot of information is trust — you need a sufficiently engaging and human-like AI. You can’t say, “I’m setting up product features specifically to get your information.” The end result should be that users and AI become friends, so users can tell the AI anything.
Justin Wong riffs on why this means moving away from the chatbot experience in his Zhihu comment:
Your interaction process with AI is not about directly outputting a piece of content, but rather understanding the user's needs and immediately starting a small project [or] delivering a frontend application. Users can continue to ask questions, modify, and iterate, but all of this revolves around a deliverable.
Sharp-eyed friends may have already noticed — isn't this just Cursor/Aider/OpenHands? Wrong. From an implementation perspective, this is indeed AI programming. But if designed cleverly in the product, you can hide the code-writing process. Then, for users who don't understand programming, this becomes 'I talked to AI, and it actually made me a PowerPoint/drew a flowchart/wrote a mini-game'. … This is a completely superior experience to traditional chatbots. … I believe Kimi products will soon have some surprising changes."
“AGI”, according to the team behind Kimi, has to feel like AGI. We should probably not expect Moonshot to put out wrapper products, especially since K2 is not multimodal, but the company certainly hopes to change how regular users experience AI usage in the near future.
What does K2 mean for competitors in the West? To quote Nathan of Interconnects again, Western labs like OpenAI, which has delayed release of its own open-weights model, are losing narrative control in open source.
We need leaders at the closed AI laboratories in the U.S. to rethink some of the long-term dynamics they're battling with R&D adoption. We need to mobilize funding for great, open science projects in the U.S. and Europe. Until then, this is what losing looks like if you want The West to be the long-term foundation of AI research and development.
K2 shows that DeepSeek wasn’t a wild exception to the rule or entirely a product of unique circumstances. Here is a startup with founders from different backgrounds, a more traditional VC funding structure, and different philosophy on AI who was also able to achieve a highly competitive open model, likely with strong uptake by users.
As for what’s next for Moonshot the company? Yang Zhilin gives few media interviews and eschews politics, and state media is yet to comment on K2. It would be a surprise if he immediately got the State Council treatment Liang Wenfeng did. Both Baidu and Tencent quickly integrated DeepSeek’s models into their products. K2 offers yet another cost-efficient option for big internet platforms looking for AI integration; its apparently strong creative-writing capacities might especially appeal to platforms like Weibo (similar to Twitter/X) and Xiaohongshu/Rednote (similar to Instagram).
Like DeepSeek, however, Moonshot is unlikely to try to broaden its user base through actual partnerships with internet platforms in China. Playing the platform-capitalism game was a point of frustration for earlier iterations of Kimi. Moonshot was the top spender on ads among AI firms in China in 2024, according to ad analytics firm AppGrowing:
The aggressive strategy caused controversy online, with users on Bilibili (similar to YouTube) and Kuaishou (similar to TikTok/Douyin) complaining about the endless volumes of Kimi ads they saw in feeds.
In February 2025, Moonshot eliminated most of its ad placements on third–party platforms to focus resources on developing its foundational model. Justin Wong says that what DeepSeek proved to the rest of the industry was that “capability was the best form of advertisement”: “[DeepSeek] showed that the path we all believed in was not just viable, but actually a wonderful route to real success. Our only regret was that we were not the one to open up that route.” From the DeepSeek moment to now K2, foundation model-focussed startups have carved out a modus operandi in China’s AI landscape.
Finally, Chinese users will obviously cheer for yet another competitive domestic model, but will others around the world take up K2? The model is, of course, compliant with Chinese laws, and you won’t be able to learn about Tiananmen from the chat interface. This is me trying to have a conversation with K2 about 1989 — note how it pulls sources, and then hesitates:
Perplexity has announced that it will soon begin post-training K2 to make it available to its users. This means that like with DeepSeek R1, an uncensored version of K2 will be available through Perplexity. For now, platforms like Perplexity have plugged the political censorship challenge facing global adoption of Chinese LLM chatbots.
Poems and a Short Story from Kimi
I played around with Kimi intermittently for a day. Its “tone” strikes me as factual, reserved, and extremely careful, a subtly different experience from using “glazey“ ChatGPT or the ever-cheerful Claude. At various points, I was even annoyed by the way it replied to inquiries curtly. I tried having the same silly conversation about vestigial organs (“has anyone out there ever had every useless part removed from their body”) with Claude, DeepSeek, and Kimi, and Kimi was the only model that did not proactively try to engage me further, displayed no inkling of speculation, and did not even use any exclamation marks to signal pleasantness!
Perhaps being a model that chooses words carefully has its benefits. K2 topped benchmarks for creative writing. Commentators have been noting its refreshing style, command of vocabulary, and narrative cohesion. Here are two poems from Kimi, one in English and the other in Classical Chinese.
Prompt: Write me a 15-line poem in the style of Mary Oliver about open source software
Response:
You do not have to be rich or famous,
only willing.
The repository waits like a meadow at dawn—
fork it, and the grasses part.
Lines of code rise, thin and green,
each commit a small heartbeat
shared with strangers who become,
overnight, companions.
Watch: a bug is caught,
its wings still trembling,
then released, lighter,
into the widening sky.
No one owns the sky.
We simply pass the telescope,
saying: look, look farther.
Prompt: 写一首有关旧金山市景的唐诗 [Write me a Tang Dynasty poem about scenery in the city of San Francisco]
Response:
《舊金山曉望》
(七言排律)
金門曉霧濕旌旄,
電纜穿空挾海濤。
赤桁橫波藏翡翠,
丹樓映日擁葡萄。
風翻酒旆千旗動,
雨過花階五色高。
誰向坡頭聽暮鼓?
遙鐘一杵落雲袍。
Translation (also from K2):
Dawn over San Francisco
(regulated heptasyllabic verse)
At daybreak, Golden Gate fog dampens the streaming flags,
power cables stitch the sky, carrying the ocean’s roar.
Scarlet trestles span the waves, hiding kingfishers like jade,
vermilion towers catch the sun, cradling clusters of grapes.
Wind flips tavern pennants—thousand banners alive,
after rain the flowered terraces blaze in five bright hues.
Who stands atop the hill to hear the dusk drum?
A distant bell—one single stroke—falls through a robe of cloud.
[Jordan] It wrote me a spectacular short story about the US trading H20s for rare earths. The prompt was “write me a 500 word story in the style of david foster wallace about the us and china doing a deal to exchange rare earths for H20 chips”
Picture, if you will, one of those late-summer Beltway afternoons when the sky over D.C. looks like a migraine—white, swollen, vaguely nauseous. Inside the EEOB, the air-conditioning is thrumming at that pitch that makes your fillings vibrate. Deputy Undersecretary for Strategic Stuff Lyle “Chip” Ketchem—who insists the nickname predates the semiconductor industry by decades and is therefore ironic—stands before a wall of LCDs displaying supply-chain heatmaps in colors not found in nature. The maps are supposed to convey urgency, but they mostly convey the sense that someone has spilled lava on a Lite-Brite.
Across the table: Minister-Counselor Wen Jieru, whose business card identifies him, in bilingual serif, as “Senior Coordinator for Reciprocal Resource Harmonization.” Wen has the kind of face that looks polite even while calculating the exact cubic meters of your personal failure. Between them sits a single legal pad upon which someone has written, in bureaucratic uncial, “MOU: H₂O ⇄ REE.” Below that, a subordinate has added, in smaller letters, “(no jokes about water boarding, please).”
The deal itself is almost comically symmetrical, like a grade-school barter of a peanut-butter sandwich for a pudding cup, except the sandwich is a wafer of circuitry etched at 1.4 nanometers and the pudding cup contains dysprosium, terbium, and enough praseodymium to make every iPhone in Shenzhen vibrate with joy. The U.S. will ship, over 36 months, one hundred thousand “H₂O” neuromorphic chips—so named because their architecture supposedly mimics the lateral geniculate nucleus of a freshwater trout, or so claims the marketing deck, which no one has read past page 4. In exchange, China will release, in quarterly allotments, thirty-seven metric tons of rare-earth oxides that currently sit in bonded warehouses in Guangxi like extremely expensive decorative gravel.
Chip tries to frame the swap as a triumph of interdependent resilience. Wen listens with the expression of a man auditing a children’s recorder recital. When Chip invokes “win-win,” Wen replies, in perfect MBA English, “Wins are scalar, not vector. Direction matters.” No one laughs, because laughter is not on the agenda. Instead, they proceed to Attachment 7-B, subsection (iv): “Protocols for Geopolitical Optics.” This is the part where each side agrees to describe the other as a “valued partner” while simultaneously briefing their respective media that they have “held firm on core equities.” The linguistic gymnastics make Chip feel like he’s chewing aluminum.
They break at 19:27 for individually wrapped sandwiches whose ingredient lists read like passwords. Wen peels the sticker off his turkey-swiss and folds it into a tiny paper crane, which he places on the edge of his plate like a punctuation mark. Chip stares at it, suddenly aware that the crane is the only object in the room with no hidden agenda. He experiences a moment of vertiginous clarity: the chips will be obsolete before the last shipment clears customs; the rare earths will be stockpiled in a Nevada salt cavern until some future administration rediscovers them and tweets about strategic surprise. The swap is not about materials at all; it is about the narrative that materials can be swapped. The story is the commodity.
Back in the conference room, the LCDs refresh. A new heatmap blooms—magenta, viridian, arterial red. Chip signs. Wen signs. Cameras flash like paparazzi at a funeral. Outside, the sky has ripened to a shade that Pantone has not yet named, but will eventually market as “Imminent Dusk.”
ChinaTalk is a reader-supported publication. To receive new posts and support our work, consider becoming a free or paid subscriber.
good article, and as someone who is currently making his way through _Infinite Jest_ i absolutely applaud your choice of author to ask moonshot ai to style its short story after