RAND's Matheny on the Science & Art of Public Policy

What does Moneyball for policy look like?

Dec 18, 2023

Jason Matheny recently passed his first anniversary as CEO of the RAND Corporation. This is part three of my conversation with him.

In today’s excerpt, we discuss:

Takeaways on the drawbacks of the current methodologies driving decision-making on the NSC from his time in the White House.
The upsides and downsides to tech moonshots, and why OpenAI leads the pack.
How art informs our thinking on tech and public policy.

The Science of Policy

Jordan Schneider: Past ChinaTalk guest Dan Spokojny of fp21 gets really frustrated when folks in government say foreign policy is an art, not a science. How can we get more scientific approaches into national security and technological decision-making?

Jason Matheny: For me, it was just exposure to the work in judgment and decision-making that had been done over the last forty or so years.

There’s just such a wealth of knowledge that we’ve generated over that time period about how human beings actually make decisions, what influences their judgments, and the kinds of practices that can lead to better judgments and better decisions.

Very little of that research has actually penetrated public policy decision-making.

I don’t know quite how to explain it. You would think that policymakers might be incentivized to make better decisions and that then they would use whatever empirical findings we had on how to make better decisions.

But there aren’t necessarily super strong incentives for improving our decision-making processes. We see the same kinds of failures in businesses that even have very strong financial incentives to make better decisions. Yet they also are not adopting some of the lessons learned from research on human judgment.

I asked Phil Tetlock and Barbara Mellers this question recently. They just pointed out that, in many cases, the incentives are not aligned for individual managers to make decisions that are objectively better.

Instead, they might be motivated to make decisions that appear better or safer, in some ways, to whoever determines their salaries or their professional futures. They’re thinking about something they can defend to a board of directors.

Jason Matheny: If you say to a board of directors, “Hey, I want more of our decisions to be made on the basis of betting markets, pre-mortem analyses, and crux maps,” the board of directors might be pretty mystified by all that. It’s not clear that they would necessarily want to endorse it.

There is just generally a challenge in overcoming the lack of awareness of some of these methods that test strongly when we evaluate them.

There’s also a challenge in that some of the highest-stakes decisions that we make in policy are not necessarily ones that have strong incentives for making the right decision or an accurate judgment.

I’m really interested in this. What does Moneyball for policy look like? How do we use science to help us make better judgments and better decisions?

We have spent a lot of time in cognitive psychology studying how humans actually make decisions under time pressure and under uncertainty. We know a lot more today than we did fifty years ago about this topic.

Let’s start using some of what we’ve learned. Let’s actually see whether these methods that appear to work well in other contexts can help us in policymaking.

Giant Steps Are What We Take

Jordan Schneider: Let’s talk about moonshot projects. Can the government do these anymore? Should they? How can RAND help build or rebuild that institutional capacity?

Jason Matheny:

We still do moonshots. A project like the F-22 cost more than the Apollo program. We still have these mega-projects or giga-projects. Maybe all told, some of the particular defense programs are terra-projects.

The value of moonshots, however, depends on a few things.

A moonshot in the wrong direction can be worse than a small project in the right direction. Small projects tend to allow more rapid error correction or course correction.

There’s increasing evidence that we underestimate the risks and overestimate the benefits of gain-of-function research. You wouldn’t necessarily want a moonshot for gain-of-function research.

A moonshot for AI capability might also end up being a net negative if you don’t already have a deep foundation in AI security and AI safety.

I agree with Richard Danzig who had a great paper called “Technology Roulette.” One insight from that paper is you want to make sure that you’re considering the risks of the technologies that you create.

You don’t want to jump too far in the direction of developing a technology that’s going to create asymmetric risk for yourself. You might want to focus on defensive technologies or ones that asymmetrically favor defense or safety.

You can think about differential technology development that’s focused on embedding safety and security from the start. We do need significant investments in things like as safety, biosecurity, lab security, judgment, and decision-making.

Some of those might deserve moonshots, but I’d even be happy with something even getting us to low Earth orbit on some of those topics even before getting to the moon.

OpenAI as Moonshot

Jordan Schneider: OpenAI, it’s fair to say, might be the most successful moonshot project we’ve seen over the past few decades.

What has made them so successful? What are your hopes and fears as OpenAI and other labs, both in the US and China, continue to explore AI’s capabilities?

Jason Matheny: OpenAI did a really admirable job with a lot of safety testing and red teaming. They published this thing they called the “system card,” alongside their main GPT-4 paper, that documented their safety and security work.

I’ve overall been impressed by the analysis they do on safety and security risks. Sam Altman, Miles Brundage, Jade Leung, and others on the team have been really thoughtful.

It’s also an interesting organizational design. You’ve got around 500 employees at OpenAI and it’s able to leverage the computing infrastructure at Microsoft. It links back up to what we were talking about earlier.

Sometimes when you’ve got a mid-sized organization able to leverage the infrastructure of a much larger organization, there can be some real benefits.

OpenAI didn’t need to spend a lot of time building its own computing infrastructure. It could leverage that elsewhere. That’s a big part of what makes these large language models practical now.

The White House Years

Jordan Schneider: What did you learn from your experience in the White House, both at the National Security Council and the Office of Science and Technology Policy?

Jason Matheny: The people I worked with were some of the smartest, hardest working, most compassionate people I’ve ever worked with. The human capital there is phenomenal.

There are a few challenges working in the White House. One of them is the tyranny of the immediate. There are lots of urgent problems. Some are both urgent and important, and some are just urgent.

The Eisenhower matrix of urgent and important is really something we felt acutely all the time. We know there’s a problem that’s even more important, but carving out the time to work on it can be really challenging.

If you look at the amount of time spent in different policy processes as a proportion of the total, the fractions would not align with what you might think of as being the most important problems for the country to tackle. And it’s not because the people at the White House think, “Oh, well, the things that we’re spending the most time on are the most important.” It’s just that there are timelines for each of these.

If the President has a meeting with the ambassador from country X — even if country X is maybe not the most important thing that’s going to affect the United States in the next fifty years — he’s still got to have that meeting and you’ve got to prep him for it. A lot of that kind of timing influences how attention is budgeted in the White House.

I’d read about this in histories of the National Security Council, but I hadn’t appreciated it. It’s a really hard thing to figure out how to balance.

There have been various calls for restructuring the National Security Council so that it has a warning function and can do longer-range analysis. It’s just really hard to do that within the White House because there’s so many other time pressures.

I worked in both the National Security Council and in the Office of Science Technology Policy. OSTP had the luxury of being able to work on longer-range analysis. OSTP had sufficient room for analysis on things like supply chains or moves and counter-moves in technology competition. We could think a lot about AI safety and biosecurity on longer timelines than we typically would in the NSC.

Finding ways to leverage other parts of the White House — those that are able to have a little bit more room to think — is helpful.

I also think there are some problems that are likely to be thousands, maybe even millions of times more important than others.

I used to work on this thing called the Disease Control Priorities Project about twenty years ago when I worked in global health. One insight from that is there are interventions that are 10,000 times more cost-effective than other interventions. Malaria bed nets can be 10,000 times more cost-effective than building a new hospital in the middle of Egypt.

We don’t often appreciate just how cost-effective something can be. Doing the math to get even a rough order of magnitude for the consequence of policy decisions is something we don’t often do. But we could do more of it.

Another thing is incorporating more insights from people like Daniel Kahneman, Amos Tversky, Phil Tetlock, and Barbara Meller on human judgment decision-making and using these in the policy process itself.

We don’t do much pre-mortem analysis. We don’t do much adversarial collaboration, which is a really interesting approach to settling certain kinds of disagreement. We don’t do crux maps. We tend not to use probabilities.

A lot of the policy process could probably be improved if we used a little bit more of what we’ve learned from judgment and decision-making research over the last few decades.

We need more red team analysis. Picture having a group whose permanent responsibility is to imagine how China would react to certain kinds of policies. Have that group thoroughly embedded in Xi Jinping’s thoughts.

It would be living day in and day out immersed in PRC media, doing its best to simulate the Politburo’s thinking. That’s probably really useful. It’s at least worth testing. Running these kinds of experiments on analytic and policy processes would be worth trying at least.

Jordan Schneider: I assume you were involved in one of the more consequential decisions of this first term of the Biden administration — the October 7 export controls on China. That must have been an enormous institutional lift to form that policy.

Jason Matheny: [We were] doing the analysis in a really detailed way — really understanding moves and counter-moves, costs and benefits. [We had] a team that was really deep on the technical details of chips — what they can do, what thresholds matter, how they could be used in the future.

[We had] enough technical expertise — both within government and from the national labs and elsewhere — that can help advise on that.

We were fortunate too in that some of the work had been done for us at CSET and other think tanks in advance. It’s really hard to do a lot of the thinking while you’re in the White House because there are so many demands on your time.

One of the advantages of places like RAND and CSET is that you sort of have the luxury of being able to do the math and work out the analysis. We were able to draw on that.

The other big lesson was that something like that doesn’t pass unless a significant number of departments and department heads agree with the approach.

So, that policy process was very inclusive. It involved a lot of agency heads. Their agencies reached the same conclusion for the same reasons on why the controls were warranted.

Semiconductors are pretty unusual in that they’re a real choke point with wide and deep moats in the global supply chain. They’re incredibly consequential for so many strategic issues.

In thinking about China’s ambitions in military modernization, cyber operations, and human rights abuses, chips play a major role in all three of those categories.

Old Executive Office Building in Washington, DC | The Thinking Insomniac

The Art of Policy

Jordan Schneider: You were an art history major? How did that come about?

Jason Matheny: I wanted to be an architect, and I went to a college that didn’t have an architecture school. I realized I wanted to be an architect after I had already started college. I worked as a social worker in Cabrini-Green, a housing project in Chicago.

As a social worker, I saw that building design can really impact the health, happiness, and security of families. I just wanted to design better public housing, social housing, or affordable housing.

Because the University of Chicago didn’t have an architecture program, art history was the way I could write a thesis on the history of social housing. Then I went to architecture school for a year.

In the library, I saw an orphaned copy of the “World Development Report” from the World Bank. It was the 1993 report, and it included these tables of statistics on preventable deaths due to infectious diseases.

I had never really had any exposure to that — just the millions of deaths caused each year that were completely preventable, especially childhood deaths. I was stunned. I couldn’t believe that these tables were right.

I asked some epidemiologists, “Is this true, really?” They were like, “Yeah, how did you not know about these basic facts about the world? We have about 4 million plus deaths for children under five due to these preventable diseases?”

Then I moved to start working in public health and infectious disease control and worked on that for several years.

Jordan Schneider: So, how did you end up working in defense and intelligence?

Jason Matheny: What brought me to national security was that in 2002, I was working in India on a global health project on malaria and tuberculosis and HIV. While I was working there, a DARPA project synthesized a virus from scratch just to see if it could be done.

That was an “oh, crap” moment for the public health community. It brought into focus that pathogens could be developed and could be much worse than existing pathogens. A virus that we had eradicated and controlled, like smallpox, could be recreated.

Some of the people I was working with in India were veterans of the smallpox eradication campaign. They were like, “Oh, man. Some sophisticated misanthrope is just going to recreate the smallpox virus. We’re going to have to go through this all over again.” That’s what shifted me to work in national security.

I cold-called Andy Marshall because he was the only person I really knew from the national security world. I had read a couple of these papers in college about RAND and its early work. I wrote every Andrew “something” Marshall at pentagon.mil.

I wrote twenty-six of those variants. Four or five wrote back, and one of them was Andy W. Marshall. I lucked out, and he encouraged me to come talk about the future of bio risks and other risks. That got me started in national security.

Jordan Schneider: Could you make the case for going to an art museum?

Jason Matheny: Have you been talking to Richard Danzig? When I was at IARPA — and he offered the same thing to DARPA — he’d say, “Why don’t we go on a tour of an art museum, and we can just point out the connections between depictions of technology and art — how it was viewed in its time and how we think about technology today.”

Art is a set of artifacts we can use to understand how society has changed.

It’s not a perfect sample. Only a tiny percentage of people are responsible for producing the art that sits in museums today. Only a tiny percentage of people were even able to see much of the art that exists in museums today.

But it is one instrument we have for recording how people valued different things within society over different periods. Taking an art tour with Richard Danzig should probably be on all of our itineraries.

The other case to be made is that you use a different part of your brain, so you can give some parts of your brain a rest. I spend an awful lot of time thinking about and working on depressing topics.

When I want to take a break, I often will walk down to the Hirschorn or the National Gallery and just try not to think about nuclear war or pathogens or cyberattacks and just think about these beautiful pieces of art around me just to put myself to rest for a bit.

I hope everybody has something like that that they can use to activate a different part of their brain.

Jordan Schneider: In another interview of yours, you were talking to a group of young people and told them that they should start working on catastrophic risk as soon as they could, because you wasted ten years of my life not working on catastrophic risk.

But look at you. You had all this runway ahead of you. The conversation we just had on organizational design and incentives, that ties into your early interest in designing better public housing.

I would not encourage everyone to have a laser-focused optimization function on whatever they think is the most impactful thing they can do today.

There's a lot of uncertainty. There's a lot you don't know as a young person. Bringing different perspectives to these important questions is the way we’ll probably end up having the most differentiated impact.

So, here’s my final question and maybe a suggestion. For your six-month pausing of the world, I'd love to see you pick up that drafting pen and redesign the White House and EEOB to help with better decision making.

I want to see what blueprints you have planned for the 2070 renovation!

Jason Matheny: Great. Okay, I'll get to work on an outline.

Mood music outtro: