Helen Toner Takes the Reins at CSET
Today our conversation covers…
What it means to run CSET in 2025, and how to keep think tank work rigorous and relevant in the age of AI,
The “good-faith” vs “dark arts” actors shaping Washington’s AI policy debate,
What her recent trip to China revealed about how Beijing is thinking (and not thinking) about AI,
Why AI progress might stay “jagged,” and what that means for AI policy,
Plus: why Jordan can’t fall in love with AI.
Listen now on your favorite podcast app.
The 2026 Tarbell Fellowship is now open. You could come work with us at ChinaTalk! Apply here.
Don’t just take it from me. Take it from our current Tarbell Fellow, , on his experience so far:
“Tarbell placed me at ChinaTalk for a year, fully funded! It’s been a dream setup to report seriously on China, tech, and AI. The fellowship’s training covers both journalism and the fundamentals of AI, which makes it one of the best on-ramps for people who didn’t come up through traditional reporting or AI pathways.
I always thought about tech journalism but assumed I missed my chance after college. Tarbell gave me another shot. ChinaTalk gives me the freedom to chase questions I’m genuinely curious about in the China–AI space, paired with a team that constantly reads each other’s work, shares articles, and brainstorms ideas. You’ll be producing impactful work for a large audience, but you’ll also be learning every day.
At ChinaTalk, I spend my time digging into the semiconductor supply chain, Chinese AI models, U.S.–China relations, and whatever else I get excited by. If that sounds like your idea of fun, apply!”
Think Tanks in the Age of AI
Jordan Schneider: As the new interim Executive Director of CSET, are you excited to rip up everything they’ve created and remake it in the image of Helen Toner? What is your vision for the future of CSET?
Helen Toner: If there’s one thing that I have learned from the many friends and colleagues who’ve rotated in and out of government, it’s that your day-one mission should be reorganization. Step in, tear everything up, and change the org structure.
No, I’m kidding. It’s exciting and an honor to be in this position. After Jason and Dewey, I’m stepping into big shoes. I’ve been at CSET since its founding in 2019, so it’s exciting to shepherd the organization into a new phase.
CSET’s success is built on a foundation of excellent work, and I want to continue that. The core of our mission is to produce intellectually independent research that is driven by evidence and data.” Our data science team is unique in the think tank world — their data powers our analysis. On every project, we make sure our analysis is rigorous and driven by the best evidence we can find. We care that our work is technically informed.
One of our founding goals at CSET was to show a different way for think tanks to operate, and ideally, inspire others to follow us. I think we’ve been really successful there. You now see RAND with a huge emerging tech and national security effort, and CSIS doing more translations and data visualizations — things that were core to the CSET model and are now much more common in Washington.
That’s great, because it proves our model works. Of course, it also means we have competition, so we have to show what makes CSET unique and where we provide particular value. Our deep expertise on China is a perfect example. We have a whole range of China specialists woven throughout our team, covering everything from language to specific subject matter. I’m excited to lean into that and to keep evolving. Emerging tech never stands still, so we have to keep figuring out where we can add the most value.
Jordan Schneider: I agree that CSET has raised the bar for discourse in Washington — it’s why I gave CSET ChinaTalk’s only “Think Tank of the Year” award back in 2022. It’s been heartwarming to see your standard of using real evidence on thorny topics like chip controls, immigration policy, or the PLA’s use of AI resonate so strongly in the broader debate in Washington.
But at the same time, we’re seeing a paradox. Since 2019, it feels like facts matter less than ever. Arguments get reduced to tweet-shouting matches, and remarkably, those shouting matches are now becoming central to the actual policy debate on AI. What’s your take on these two trends happening in parallel? What’s the synthesis?
Helen Toner: I think there are multiple layers here. You have the headlines in the New York Times or the Wall Street Journal, but there is also work happening beneath the surface. The U.S. government has millions of employees, and the subject matter experts doing the work are interested in details and evidence. There’s a steady demand from them for the kind of support we provide, and they are very responsive to facts.
Another example is the discourse around recent AI legislation. Take California, for example. Last year, the discourse around the SB 1047 bill was awful. Then this year, they convened a governor’s panel, published a report, adopted its recommendations, and passed a less controversial bill. It’s a crazy turnaround. We saw something similar with the EU Code of Practice — it looked like it was going to fall apart, but then it came together. I don’t want to sound too pollyannaish, there’s a lot to be concerned about. But it’s important to remember that sensible work is still getting done.
Jordan Schneider: I started ChinaTalk in 2017, and CSET started in 2019. Back then, the intersection of U.S.-China relations, emerging technology, and national security was not a front-page topic.
Helen Toner: When we said we wanted to have a whole organization focused on emerging tech and national security, and people were like, “A whole organization? Like, four people?”
Jordan Schneider: It’s been a wild adjustment for this space to go from an idea funders would laugh at to something presidents tweet about all the time. But that shift has also brought in layers of bad faith. Back when this community was smaller, there weren’t many people playing dirty.
I think CSET has its heart in the right place and is doing earnest, yeoman’s work. But there are snakes in the grass everywhere now. There’s so much money riding on this research, and that wasn’t true a few years ago. I admire your pollyannaishness — I think it’s good for your mental health. But is the most effective option to put out good research and facts? Or are “dark arts” needed to have that research shape policy?
Helen Toner: I don’t think the only options are “put a white paper on your website” or “go full political dark arts.” There’s a lot of space in between. From the beginning, we’ve done more than publish research — we actively seek out the relevant policymakers, brief them, and work with their teams on legislation. Now, we’re also thinking about how the internet has changed what that means for us. Should we be doing videos? I’m not sure, but we should at least consider it.
Another big shift, which I know you follow, is the trend toward individual brands over institutional ones. Some of our people are eager to give that a go, while others — especially those from the intelligence community — are like, “Oh God, shoot me before you make me go on Twitter.” We’re exploring that space — finding ways to keep doing good-faith, fact-based work while operating effectively in today’s ecosystem.
Jordan Schneider: I worry that an organization where good-faith, facts-focused people are comfortable is fundamentally different from one with “dark arts” specialists. The cultures and incentives don’t mix.
Helen Toner: Will there be a ChinaTalk “Dark Arts Think Tank Award”? Who would that go to?
Jordan Schneider: Wow, I don’t know. I can’t give any names here — this is for public consumption. But I agree that there will always be an audience for grounded data, and someone needs to provide the facts.
Helen Toner: When we talk about “the Facts,” it’s not about some ideas being more virtuous than others. But if you want to accomplish something and care about results, then you need to know what the world looks like.
We worked closely with the Biden administration when they were considering outbound investment controls — asking them, “How will you implement these controls? Do you have the necessary information to do it effectively?” This isn’t about taking a holier-than-thou position. It’s about the reality that if you don’t know what’s going on, you’re going to try things that backfire — and most people want to avoid that.
Jordan Schneider: A better framing might be that it’s better to have data in the discussion.
It’s remarkable what a single researcher can do in recent years with an “individual brand”. It’s wild to think that CSET was around before we could ask ChatGPT what The PLA Daily 解放军报 says. I can now code data visualizations in two hours, which I used to assume would require a CSET-level team and budget. How do you think these new tools change what a solo researcher or a small team can accomplish? Does this change how CSET operates?
Helen Toner: We’re looking at it from the opposite side — what unique things can our larger team do that an individual still can’t? Our data team is tackling a huge data science problem called “entity resolution”. That’s figuring out that “Google London” and “DeepMind” are both “Google” in a massive dataset of text. It’s a huge, messy problem, and using language models in a carefully designed and validated pipeline, we blew past previous results.
We also analyzed ~3,000 AI contracts the PLA is buying and used language models to parse that data. As a larger team, we can do things an individual researcher can’t. We can validate our results and test different models — like when to use an expensive, frontier model versus a lighter one that’s faster and can handle high volumes. We’re doing tons of experimentation there, and the team is coming up with some really cool stuff.
Jordan Schneider: CSET was early on important AI topics, but has remained ideologically neutral — you were writing about semiconductor export controls in 2020, but have not published an “AI will arrive in 2027” style analysis. Is now the time? Are the odds of those radical changes high enough that you need to start spending your team’s time and budget exploring them?
Helen Toner: I have a unique perspective because I have my feet in two worlds. I’m from the AI safety community, which is in that mindset, but most of my team at CSET is not “AGI-pilled.” We’ve done a lot of work on scaling and red-teaming, but not the “OMG AGI” work. We are currently hiring someone to work on frontier AI issues, and I’m hoping to increase our work in that space.
Jordan Schneider: What are you excited for this person to do?
Helen Toner: I’m excited about the “Frontier AI” framing. I’m glad RAND is now researching AGI, but the concept of AGI is messy and contested — it’s not clear what, if anything, is there. There has been a giant gulf between the AI systems we have — those we can touch and test — and hypothetical concerns about future AGI. But in two years, that gulf has gotten smaller. Now we can look at current systems and extrapolate future ones — which makes this topic amenable to CSET’s evidence-based methods.
I’m psyched for this research. It won’t be “CSET predicts AGI in 2027,” but it’s important to consider the possibility of AGI or superintelligence on timescales soon enough to matter for policy. Watch this space.
The Jagged Frontier
Jordan Schneider: You’ve expressed the view that AI progress could stay jagged. Can you elaborate on that?
Helen Toner: The idea comes from Prof. Ethan Mollick, and possibly also Andrej Karpathy. I highly recommend following Mollick’s AI work on Substack, LinkedIn, or Twitter. His idea was a “jagged frontier” — that AI is good at some tasks and surprisingly bad at others.
I recently gave a talk on this, arguing we should take seriously the idea that AI’s progress might remain jagged. Right now, most people fall into one of two camps — either they think AI is all hype and a “nothing burger” — or they’re in the “AGI by 2027” camp. That group believes powerful AI will become a drop-in remote worker or an automated AI engineer. Both are non-jagged views of the future. The question of what persistent jaggedness would look like is underexplored.
Jordan Schneider: The “jagged frontier” idea is more nuanced than mainstream discourse on AI — the Twitter brain, swinging wildly between “it’s over” and “we’re so back.” Why do you think people resist the possibility of uneven AI development — that the next model won’t solve everything? Why does the jagged idea struggle to gain traction, even though it is our current reality?
Helen Toner: Most people agree today’s AI is jagged, but they believe the future will be different. I think that’s because we use humans as a reference point — we believe that what’s difficult for us must be universally difficult, instead of seeing it as a product of our own evolution. Since the 1950s, we’ve debated — are we recreating the human mind, or building useful machines?
We’re currently far down the “build useful machines” path, but the idea of recreating the human mind is built into the field. I think this is why people expect AI to be more human-like than it is.
Jordan Schneider: There’s money involved now — the AI hype is backed by enormous financial incentives.
Helen Toner: Jaggedness doesn’t only refer to the troughs where AI struggles — there are high peaks as well. In the next 5-10 years, I expect us to exploit the heck out of those peaks. But the way we do so must account for the troughs.
Jordan Schneider: Is jagged AI more tractable for policy research? Is CSET’s work more relevant in that scenario?
Helen Toner: If jaggedness persists, fast takeoff scenarios are less likely — scenarios like an automated AI researcher that makes ten years of progress in six months. That would be a hard world for policy to operate in — there isn’t time for the government to form a commission, write a nice report, and debate it in the next legislative session. Jaggedness leads to slower AI progression, which gives us time to reflect, experiment, and adapt.
I’m not certain jaggedness will persist, but the idea is underrated in the AI community. At the same time, we should consider the possibility of non-jagged, rapid AI progress. It could still happen, although it’s not my best guess.
Jordan Schneider: There’s a resource allocation problem in AI policy research. Should we focus on a tangible, near-term jagged frontier — like AI’s impact on cybersecurity — or on the sci-fi futures of self-improving AI? People are drawn to speculative, sci-fi scenarios — a cybersecurity paper won’t go viral like “AGI by 2027” did. But there is value in working on a more probable future.
Helen Toner: There is a lot of low-hanging fruit in research on jagged development, and a lot of possible futures. What will AI be good at? What tasks will it struggle with? What does that mean for adoption and integration?
A jagged frontier means we are unlikely to fully automate complex jobs or goals. Instead, we will get powerful AI advisors and a “centaur” model of human-AI teaming, which you mentioned in the AI girlfriends podcast. Future human-AI collaboration scenarios are underexplored because predictions of super-powerful AI assume everything will be automated. They focus on abstract problems like alignment, not the messy, practical details of human-machine teaming that a jagged world would demand.
Jordan Schneider: After writing a paper on AI honeypot espionage, I decided to do some experimenting. Over the past few days, I’ve tried to fall in love with an AI, and it’s not lovable in the slightest.
When it comes to personal comfort and consolation, AI jaggedness is very apparent. There has been a lot of recent reporting about people who’ve developed close, intimate relationships with AI, but it’s not doing it for me. What should I make of that, Helen?
Helen Toner: Have you tried the Grok anime goth girl? You need to find the right one for you.
Jordan Schneider: It was bad — really repulsive. Even if I’m not the target audience for these AIs, if they were smart, they would have figured me out after 10 minutes of conversation — the way TikTok figured me out after 45 seconds of swiping. These models cannot do that — that’s an important detail.
The lack of personalized learning is a huge hurdle for AI in the workplace. Instead of learning from user input, models are trained and dropped into organizations, leaving people to figure them out. If the future of this technology depends on personalization that fits like a glove — professionally and personally — then we need to solve this.
Helen Toner: There’s a long way to go. We held a workshop in July about automating AI R&D and the potential for an “intelligence explosion” takeoff. We need to question underlying assumptions — what does progress look like? What are the gaps? How soon can we fill them? We’ll examine this in an upcoming CSET paper.
Jordan Schneider: There’s tension in our view of AI’s capabilities. It’s easy to overlook its limitations in work you do not do yourself, but in your own work, you can feel the jaggedness firsthand. You have an intuitive sense of where AI is exceptional and where it’s uneven.
Ironically, AI engineers are the most optimistic about AI’s capabilities — maybe a little high on their own supply. But the proof is in their paychecks — companies are hiring them in droves because AI cannot do their jobs.
Helen Toner: There are many sources of jaggedness.
A key source of AI’s jaggedness is the context window — how easy is it to input the organizational or practical context of a task? Some professions, like software engineering or marketing, are easily digestible for an AI because you can copy-paste the relevant code or creative brief. But most jobs can’t be reduced to a text file — their context is messy and organizational. We haven’t fully grasped how this single limitation will shape what AI can do and how quickly it can do it.
AI Debates in China
Jordan Schneider: Helen, you were in China recently. How was that trip?
Helen Toner: It was great to be back in China. In 2018, I was in Beijing for 9 months, studying Mandarin and learning about China’s AI ecosystem. But between my green card, the pandemic, and having kids, it had been ages since I was there. I went for a quick five-day trip to Shanghai for the World AI Conference, which was gigantic. You know what Chinese conferences are like — the huge stage, the flashing lights. Robots were walking around everywhere, something you couldn’t get away with in the U.S. Kids were petting little quadruped robots that were roaming the floor. It was a good time.

Jordan Schneider: Were you recognized?
Helen Toner: No, definitely not. Not that anyone told me.
Jordan Schneider: What’s your sense of the U.S.-China AI dialogue and opportunities for discourse or cooperation?
Helen Toner: People in the AI safety community often ask why there isn’t a U.S.-China dialogue on avoiding a race to superintelligence. The answer is that there is no agreement on what the problem is, or what the U.S. and China’s interests are. At a Chatham House discussion I recently attended, the Chinese organizers were divided on whether to focus only on superintelligence or broader development questions as well. Within their team, there was no consensus on the core issues. These conversations are a good start, but we still have a long way to go.
Jordan Schneider: A core AI policy question is how the U.S. and Chinese ecosystems will relate to each other. What are the other key questions that will define the field for years to come?
Helen Toner: On the national security side, the U.S.-China dynamic is a big one, covering both competition and the potential cooperation on AI. Military integration is another huge question. The focus is shifting from developing advanced AI to how it changes a military’s operational concepts and the way it fights. This is an adoption challenge.
There are also serious risks around cyber and biosecurity, but we might get lucky, and the threats are manageable. I’m personally more concerned about cyber, but I know well-informed people with access to classified information who are deeply worried about the bio risks.
Outside of national security, we’ll see more community-level issues, particularly around data centers. A narrative about their water use is gaining traction, and while the data may not show an unusual amount of consumption, the community perception is strong enough to create backlash. There are also social questions. We do not have a framework for dealing with AI companions, especially for children, and the impact of AI on labor and jobs is not going away.
AI Parenting Advice
Jordan Schneider: Do you have any AI parenting takes, Helen?
Helen Toner: I have a three-year-old and a one-year-old, so thankfully, we’re not there yet. But I worry the “engineer-brained” approach to parenting reduces child-rearing to a set of tasks. The idea that if an AI can entertain or teach a child “better” than a human, then it’s a net win, misses the point. The relationship between a parent or teacher and a child is a huge part of what it means to grow up and learn. AI should be a tool to enhance connection, not replace it. If an AI generates a story, read it to your child, but do not be too utilitarian. What are your thoughts?
Jordan Schneider: Abstracting love is a high bar for AI.
Kids are wired by billions of years of evolution to trust a warm, sweaty mammal. An AI can certainly teach them physics or math better than I can, and outsourcing that is one thing. But the biological need for connection is another. Primate studies show the same thing — the monkeys want to be held. Trying to engineer that need away is playing with fire. Maybe a robot will get there in 20 years, but you’re running hard against evolution. No offense to anyone using Midjourney for children’s books — I have that tab open right now.

Helen Toner: I think there are good ways to do it.
Jordan Schneider: Absolutely. But the sci-fi future where kids don’t need loving parents for connection or as models of how to relate to other humans seems a long way off.
Helen Toner: There is a This American Life story that sticks with me, about a single dad and his daughter. He was a physicist, and she would ask him astronomy questions like, “Why do stars...?” or “Where did the Earth come from?”. Kids love to ask “why” questions. He found answering them stressful, so one day, he asked her to write down all of her questions. He locked himself in his office and wrote up a gigantic set of answers for her. The interviewer on the show asked the girl what she thought, and she said, “I wanted to hang out with my dad.” It’s so tragic. Don’t do that with AI.
On Calling Timeout
Jordan Schneider: My theory is that CSET only exists because of Jason Matheny. The national security risks of China’s rapid AI growth were completely off the radar for these funders. It took an exceptional person they trusted, like Jason, to convince them to build a community around this idea.
Before CSET, there was no tech team with deep China expertise. I spent years trying to make the case that competition with China mattered, that AI was more than one small piece of a larger puzzle, but people were unconvinced — that idea was ’too spicy’ or too far out.
There was a brief moment during the 1st Trump administration when it became a mainstream concern. Many corporate blogs, including that famous OpenAI document, were suddenly about beating China. But that moment has passed, and it feels like the issue is becoming less relevant again.
Helen Toner: It’s an interesting time for China+AI policy. When I started in this space around 2017, people in AI would ask, “Why talk about an AI race with China?” and then give AI-specific reasons why it wasn’t a race. I had to explain that they were missing the bigger picture. The U.S. national security apparatus was orienting towards strategic competition with China. For them, AI was only one small manifestation of that competition, and the AI community’s arguments were seen as irrelevant noise.
Jordan Schneider: I remember people telling me, “Oh, but if we say this, will it accelerate the race?” Bro, come on.
Helen Toner: Is strategic competition with China still the main goal of the U.S. national security apparatus? People outside the tech world are not sure — the current U.S. policy toward China is unclear. That’s disconcerting in some ways, but it also creates potentially productive space.
Jordan Schneider: In the 1st Trump administration, U.S.-China competition was a central pillar — Jake Sullivan wanted “as large a lead as possible.” But in Trump 2.0, focus on China has waned, and now we’re mobilizing against Venezuela while ignoring Chinese boats in the Philippines.
For years, I’ve thought that the U.S.-China AI race was inevitable. In many ways, it has already happened — we now have bifurcated ecosystems for AI chips, models, and hyperscalers. These dynamics seem resilient to the day-to-day whims of policymakers. How durable is this rivalry? If the American president is not focused on this issue, do the competitive dynamics of the last decade have enough momentum to continue on their own?
Helen Toner: I don’t know how resilient the rivalry will be. The competition was never about AI — competition with China was the organizing principle of the U.S. national security apparatus, and AI was one part of that. The U.S. AI sector now uses that narrative as justification for everything from faster data center permits to avoiding AI regulation. If those arguments lose force, I’m not sure what will happen.
My own prediction has always been that China’s internal demographic and economic challenges would eventually cool the rivalry, though I thought it would take longer, maybe till the 2030s. With a president who is hard to predict and an increasingly isolationist MAGA base, and a new focus on the Western hemisphere, disengagement with China could be stickier.
Jordan Schneider: The TikTok story is a…
Helen Toner: Hilarious episode in the sitcom that we live in.
Jordan Schneider: Exactly! This was a bill Congress passed almost unanimously, and then the president decided he was not concerned with an issue Congress had a bipartisan consensus on — that is an interesting detail. I’m not sure how illustrative that is.
Helen Toner: Another source of tension in the Trump coalition right now is between the “tech right” and MAGA. They have disagreements about whether to charge ahead with AI — whether AI is the best thing since sliced bread, or the devil, or the Antichrist. There is a lot of division, but both sides are less concerned about competition with China. The tech right wants to sell to China, and the MAGA world would prefer to slow the rate of development.
Jordan Schneider: Corporate self-interest could be a reinforcing driver. U.S. firms do not want to compete with Chinese companies on their home turf. Once Chinese EVs start taking market share, or Huawei chips threaten Nvidia, the game changes. The question will become, is access to China’s market worth giving up our own? The likely answer is no. U.S. companies will demand the same protected home base that Chinese firms have used to their advantage, which only accelerates the competition.
Helen Toner: Wait, they can operate here? I thought it was a one-way street.
Jordan Schneider: Bill Bishop often invokes a Xi Jinping quote that essentially says, “Our goal is to become more self-reliant at home and make the world more dependent on us.” This mindset was in the rare earths saga, where China’s escalation was a self-inflicted wound. It showed a willingness to compete in a way that alienates American elites. You can admire their ambition, but the U.S. will not accept Chinese competitors dominating key verticals — especially in the tech sector that underpins the U.S. stock market.
Helen Toner: We’ll see. I do not know how we can ban Chinese open-source models, which I think is one of the biggest threats to U.S. market share. Using open-source Chinese models presumably displaces API market share for OpenAI, Anthropic, or Google.
Jordan Schneider: They are not un-bannable — stopping individual downloads of Chinese software is a fool’s errand, but that’s not the real game. The real game is preventing billion-dollar companies from being built on Chinese open-source models, and the government has plenty of ways to do that. They can block Chinese models from government contracts, tie it up in FCC compliance issues, or make it a mandatory risk disclosure. If the U.S. government really puts its back into it, it can find a way.
Helen Toner: The government procurement restriction is a good point. Public company disclosures — that’s interesting. I agree, these policies can make it harder.
Jordan Schneider: Or change the incentives. Switching tangents, can you pitch some of the best recent work from CSET — what do you admire and plan to build on?
Helen Toner: One of our most exciting new papers analyzes 2,800 PLA AI contracts. The initial piece focuses on who is buying, and the key finding is that while the largest contracts go to state-owned enterprises, the bulk are awarded to “non-traditional” private companies and universities. More research is coming on what they’re buying.
Our work on DoD AI integration has also been impactful. Interestingly, our research has been valuable to government officials because it is public. Internal reports are often classified and hard to share, so a URL they can circulate is a game-changer. Our paper “Building the Tech Coalition,” which analyzes their use of Project Maven and the internal talent required, is a great example of this.
Number three is our work on AI and biorisks. The debate has been narrowly focused on controlling AI models, so our “Toolkit for Managing Biorisks from AI” broadens the conversation by outlining a full range of policy options, which has been helpful for policymakers.
Jordan Schneider: Let’s do two more, oldies but goodies.
Helen Toner: For oldies but goodies, I’d point to our outbound investment work, where we asked the Biden administration, “If you want to control outgoing investment, do you know how to do that? What data do you have, and what data would you need?” That implementation was a classic example of our work.
Our explainers have been surprisingly impactful. We published one about the differences between generative AI, large language models, and foundational models. A government agency was trying to decide which terminology to use in an influential policy document, and told us the explainer directly influenced their policy. Straightforward research like that has a good track record.
Jordan Schneider: Would you like to recommend some mood music to end the episode?
Helen Toner: The great China and AI scholar, Matt Sheehan, told me instrumental playlists are the best way to focus, so I’ve been listening to a lot of instrumental music. There’s a great James Brown instrumental album. Why not some instrumental James Brown?

