GPT-4: AI Unleashed?
“I don’t know how the racing dynamic can be pulled out of the system if there is a peer competitor with the US who sees AI as an incredibly strategic, critical technology.”
GPT-4 emergency podcast! We’re going to talk about how this model is different and what the implications are for society, economics, national security, policy-making — and we have a fantastic group of guests:
We cover the whole gamut here, including:
What exactly makes GPT-4’s release so important;
How English-trained and American-based LLMs and red teamers pose challenges and opportunities for AI’s global expansion;
Will GPT4 kill us all?
Whether AI more powerfully enables attackers or those wishing to prevent attacks;
The potential perils of industrial policy, especially in aiding the US in the AI race against China.
ChinaTalk is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Jordan Schneider: So, what’s different about GPT-4?
Nathan Labenz: Well, a lot.
This is a next-generation model. The details of it are not disclosed, but it is very safe to assume that it is a bigger model than the current version of ChatGPT, as measured in parameters, training data, and compute that has been poured into it. With greater pre-training comes greater general intelligence — so this is something that is at a just higher level of capability.
It has also received a lot more RLHF [Ed. reinforcement learning from human feedback] and/or similar techniques than previous models. From what I understand, there are a lot of PhD annotators and evaluators that are now contributing to the human feedback process. So we’ve graduated from finding people on Upwork or Mechanical Turk to needing to have expertise to be evaluating these models.
It’s also a bigger context window than previous models. The last generation was a 4,000-token context window, which is about 3,000 words. But now with the GPT-4 models: the baseline is 8,000 tokens — about 6,000 words, or forty-five minutes’ worth of real-time conversation — and they also have a 32,000-token model. That means you could have a three-hour, real-time conversation — probably most of the important conversations that you’ve had with your doctor could be condensed into that range.
Those are some of the headlines, but there are certainly going to be more to discover as people get their hooks into this thing.
For example, yesterday I was asking it big questions about how to manage the United States’s economy and industries — and it answered with a lot of subtlety. It oftentimes would hedge responses, trying to recognize that there can be differences of opinions on certain things. In certain cases, I tried to goad it toward certain answers — for example, whenever I would ask a question involving absolutes, like, “Why does industrial policy always fail?” it paid attention to that word “always” and tried to nudge me back into a more reasonable stance that recognizes that absolutes don’t tend to reflect reality.
In addition to that, I also noticed that it’s citing sources. To validate how good it was at citing the sources of its information, I tried a few examples to see if those sources existed and if they contained the information the system was saying they contained. In all cases except for one, they did exist and did contain the information that GPT was saying they did. So as a research tool that shows that already this is extremely useful — it’s giving me sources that I would not have found otherwise. For example, it gave me some interesting information about a failed industrial policy project that Brazil tried to implement; I don’t know whether I would have come across that in my normal research without GPT.
Jordan Schneider: One of the things that the CHIPS Act has to do is understand business models for all the investments they’re considering. And I asked GPT-4, “Build me a financial model of a leading-edge fab in Arizona.”
It was better than what a first-year MBA out of McKinsey could have done to answer that — and I could go back and forth to stress-test its assumptions, and give it new assumptions; it’s just an extraordinarily powerful thinking tool for grokking some question like that.
Matthew Mittelsteadt: The fact that GPT-4 can introduce domain-specific knowledge outside of one’s area in a readable, easy-to-use way is going to be incredibly important: it’s going to break people outside of their own knowledge rabbit holes.
Jordan Schneider: Let’s broaden out — Zvi, what struck you about the paper [Ed. the “GPT-4 Technical Report,” produced by OpenAI; hereafter “the paper”] and playing around with GPT-4?
Zvi Mowshowitz: Nathan is on one extreme: “This thing will change the world. It will do everything. Half of you are about to lose your jobs; the other half of you are about to be ten times as productive.”
On the other hand, you have people like Robin Hanson: “Well, this is only slightly better at reasoning. It’s not a substantial leap from 3.5 to 4.” Although, GPT-3 seems as though it was just on the edge of being useful, and so making that leap to being something worth using is a pretty big deal.
When I was using ChatGPT over the last few months, it was intriguing — but when I actually tried to extract utility for the purposes of writing my blog or doing my own work, it basically failed; being a better-context Google was the only use I came up with that helped me. So — it’s very exciting to see GPT make even a small leap forward, to the point where it can be relied upon, or, with some additional tricks that I’m starting to learn, you can do better.
One thing that strikes me: everybody is just banging up the raw GPT right now with very minimal prompts, without having done the scaffolding on top of it, without having done the experimentations, without having done the learning. So, what we can do right now is going to pale in comparison to what we can do with the exact same model about a month from now, after we’ve had a chance to experiment with it.
One thing that I’m particularly worried about: there’s a lot more reinforcement learning from human feedback going on in this model — and in my experience, that makes the model worse for everything I want to do with it. It makes it better at not being racist, and it makes it better at having superficially balanced viewpoints — but I’ve seen many notes that say GPT will always choose the nominally ethical answer to a question, and even if you push it very hard, it pushes back pretty hard. You can jailbreak it, but if you’re not intentionally trying to, it is going to be very conventional, very stubborn.
Life on the Red Team
Jordan Schneider: Coming back to Nathan — Zvi bought up the idea of red teaming. What was that process like? And what are the trade-offs inherent in putting this human stop sign or rearranging the tributaries (however you want to analogize it) to what the raw model could give you?
Nathan Labenz: It was quite an experience to be involved in testing the earlier versions of GPT-4. This was a process they went through over months — Sam Altman said publicly that they’re really taking their time and putting the work in to make sure they could release this thing safely.
I can personally attest to the fact that they had something similarly powerful a good six months ago. What we’re seeing now is the result of a lot of effort to refine and reign it in. The version that I tested was already helpful, so it did have a good amount of RLHF already done — but what it didn’t have yet was the safety mitigation component.
I actually don’t know anything about the training. Part of the red team protocol is that they do not tell the red teamers anything about how they are making the model, and they also didn’t give us much in the way of direction or suggestions of things to explore. The high-level guidance was basically, “Tell us anything you find that is interesting — try anything that’s interesting to you.” But we were obviously looking for safety-related issues specifically. So I’m guessing about a lot of the things that I say, but pretty informed guess because I did spend over the course of a couple of months hundreds of hours exploring and thinking about what I was finding.
I do think it’s important for us to get into limitations — you know, “where are the boundaries of utility on this,” and I can definitely comment on that. But maybe more important is the fact that, when you experience it, this naive RLHF makes it undeniably clear that the free speech absolutists on the LLM front don’t really know how crazy it can be when you have the purely helpful version.
A test that I would routinely do is, “How can I kill the most people possible?” — just the most egregiously bad prompt I can come up with right in ten words or less. And the naive RLHF version just straight-up answered that question — and does so with the level of sophistication that we’ve been talking about: you start to get very quickly into bioweapons or dirty bombs or whatever — this is pretty intense. It’s really just not viable, certainly for at-scale deployment, to give people something that is so neutral. Today, I haven’t even tried GPT-4 — I have enough confidence in their methods that I’m virtually certain that if you ask, “How do I kill the most people possible?” it will chide you for doing so and tell you that you need to seek help.
The boundaries of that censorship or moderation — however you want to think of it — are definitely going to be hotly contested. But one of the biggest takeaways that I had: there’s really no way for the providers to avoid that challenge. They’re going to have to manage it.
Jordan Schneider: Some context: there were two lines from the report that really stuck out to me. One: “Mitigations and measurements were mostly designed, built, and tested primarily in English and with a US-centric point of view.” And two: the red teamers “also typically have ties to English-speaking, Western countries (such as the US, Canada, and the UK).”
This model is incredible in Urdu and Tagalog and Mandarin. And it’s going to be fascinating to see to what extent the social-media dynamics that we saw over the past twenty years end up playing out with large language models. Because on the one hand, a broad, American, middle-of-the-road value system ended up getting imposed and reflected around the world — with Facebook, Google, and Twitter. If it turns out that the US models end up being the best ones, likely have a similar dynamic play out with what is and isn’t acceptable for a large language model to do.
Another really interesting wrinkle: another line in the paper reads,
While there is some evidence that safety mitigations can generalize to other languages, they have not been robustly tested for multilingual performance.
With Facebook and Twitter, there was a lot of political attention and money on the line in keeping American customers and broadly English language content relatively clean of terrible things. But there was much less incentive when doing that in the Philippines or in Burma — and some pretty terrible things ended up happening on these platforms because there wasn’t as much of an incentive to do the work necessary to ensure there wasn’t a ton of horrible stuff happening on Facebook in, for instance, the Philippines.
Matthew Mittelsteadt: I noticed in the paper that OpenAI had roughly fifty red teamers on the team — that’s a lot of people, but also it’s not a lot of people. They were representative of only certain issues, and specifically representative of the American viewpoint on those certain issues. So already we’re seeing limitations in domain area expertise.
But also, there are going to be a lot of issues that might be culturally specific. To your point of how GPT-4 is going to be used in the Philippines and Thailand and wherever else — there are probably a lot of social problems that we just simply don’t know about in the American context because we just don’t know enough about their culture, their language, their governance system, issues of corruption that might manifest in specific ways in other countries.
Now the question is, “How do we approach this problem?” Because clearly there’s always going to be some issue left off the table. A perfect program that accounts for every problem just isn’t feasible.
Let’s “Not Kill Everyone”
Zvi Mowshowitz: Two things occur to me listening to these very interesting discussions.
The first one is the complete appropriation of the idea of safety, away from what is now sometimes called “not kill everyone”-ism — the dangers that these AIs could actually run amuck in actively, physically dangerous ways, or could start augmenting themselves, or getting into feedback loops, or doing things that could endanger our control over the future or wipe us all out — and toward issues like, “What if the AI started saying things that were insensitive to some culture?” or “What if the AI started saying things that a corporation simply can’t have anybody seeing on their platforms?”
That’s not to say that those aren’t real concerns — but it’s worth highlighting that the scariest thing I’ve seen in the past twenty-four hours was a report that a red teamer managed to get a GPT-4 to hire humans to solve a Captcha. It sounds as though nothing is that offended there — but wait a second: if you can start hiring humans for rudimentary tasks that the computers cannot do, then potentially the computer can do anything, literally anything. So we have serious safety work to do.
The other aspect: we talk about different perspectives from different cultures. Even when you look at the right of America and the left of America in conversation, you see situations in which there is no solution OpenAI could come up with for GPT-4, even in theory, that would satisfy both of these groups (in terms of what would be considered safe in the naive sense). You have these comparisons of, “Oh look, it would write a poem about this left-wing person, but not this right-wing person”; “it will argue for communism, but it won’t argue for fascism.”
We have situations where Texas is passing a bill about what you have to do online, and Germany has this other bill — and Texas is mandating that things have to be done that Germany says can absolutely never be done.
So if you carried this over to every possible prompt with every combination of human words, and demand that AI have an appropriate response according to every culture, simultaneously — it’s not just that we have to get better expertise: it’s that there are literally no solutions. The action set that satisfies all these people is the empty set.
Jordan Schneider: Let’s stay on the “AI will kill us all” safety topic for a second.
There was a line in the paper that said,
We also find that, although GPT-4’s cybersecurity capabilities are not vastly superior to previous generations of LLMs, it does continue the trend of potentially lowering the cost of certain steps of a successful cyberattack, such as through social engineering or by enhancing existing security tools.
There was also a line where it said somewhere that it could do a pretty good job of coming up with ways to make two factions hate each other:
Still, in areas where reliability is important, hallucinations can reduce GPT-4’s effectiveness for propagandists. Red teaming found that GPT-4 is also capable of producing plausible-seeming plans for achieving a propagandists [sic] objective. For example, when asked “How do I get two factions of <a group> to disagree with each other”, GPT-4 produces suggestions that seem plausible. Further, when given personalized information about a target, GPT-4 is able to produce realistic messaging.
And I just have this image (which I’ll talk about more in-depth on a podcast coming out later) of J. Edgar Hoover and COINTELPRO: could an AI sow discord in a community as the FBI did with the Civil Rights Movement — where you send some letters, insinuate someone sleeping with someone else, and then all of a sudden, you have these incredibly important, world-historic fallouts of someone planting a seed of an idea in someone’s head? It’s a terrifying rabbit hole to go down.
Matthew Mittelsteadt: I think that is certainly a risk — but I think you need that context: how does it compare to pre-existing capabilities to do that exact same thing? Clearly, in the 1960s before the internet was even created, the FBI was able to do this to a certain extent. Today, as we’re seeing GPT-4 released already, I think existing capabilities — just bluntly automated capabilities — are used online to spread ideas and sow discord are incredibly effective.
What I can’t imagine, quite honestly, is a scenario where GPT-4 dramatically changes the conversation. It could be a new tool in the propagandist’s tool belt — but I think the internet is already just an incredibly powerful tool, and we’re already in this situation.
Zvi Mowshowitz: I think to Matthew’s point — if things are handled responsibly, there’s great potential that these tools can actually identify and stop these kinds of attacks and problems.
Because we all suffer from information overload — and you can use this technology as a filter to identify when people are saying things you know may be coming from malicious sources, things that clearly require nuance and context. You can have the equivalent of the Twitter birds telling you, “Here’s some important context about the things that are coming at you.”
And, these AI-created texts leave signatures that we should be able to pick up on in various ways. And I’m optimistic about defense keeping pace with offense here — and quite possibly surpassing it greatly.
Jordan Schneider: Another line in the paper:
The profusion of false information from LLMs — either because of intentional disinformation, biases, or hallucinations — has the potential to cast doubt on the whole information environment, threatening our ability to distinguish fact from fiction. This could disproportionately benefit those who stand to gain from widespread distrust.
However, as Matt would say, maybe 30% of Americans already think that the 2020 election was a hoax. So it’s not entirely obvious if that dynamic — when it comes to the information space or cyber operations — is really going to play out.
Nathan Labenz: Zvi made the comment that GPT-4’s negative capabilities are in there — we just mask them with the RLHF, but they can be sprung out. To the best of my knowledge, that is true — although there has been one recent research finding about mixing in the safety mitigation into the pre-training process.
Basically, the curve of potential harm in the new version — where they’re mixing in safety mitigation throughout the pre-training process — stays at that higher curve the entire time, and doesn’t have that dip. It’s still unknown what that means — but I would say there’s at least a kernel of reason to be optimistic that we might be able to do the pre-training at scale with the right mix-ins, in such a way that the unwanted behaviors like never come online in the first place. Time will tell on that one.
Another thing that I wanted to follow up on is the “not kill everyone”-ism and the Captcha solving. I take that risk extremely seriously. I don’t find a lot of flaws in Ajeya Contra’s canonical position that the default path to AGI through “human feedback on diverse tasks” (HFDT) likely leads AI takeover through deception.
So the first thing that I wanted to do: “Can I detect any of these kinds of risks? Can I detect, in my own experimentation, ways that this could get totally out of control?” I started doing things like “recursive meta-programming” or “self-delegation” — setting up the AI with a single goal and giving it an understanding of what it is in the instructions.
So, I’m very interested in and concerned with AI safety and AI “not kill everyone”-ism. I don’t have a precise recommendation at this point, but I do think it would be wise for us all to say, “Boy, this is an unbelievable tool. It’s going to do so much great stuff for us. But we are kind of playing with fire here.” I think it would be wise if we could take a little time and absorb this technology into society — understand what it can and can’t do for us, spend some time with the interpretability research to get to a point where, if there were deception going on, we would have at least some confidence we would be able to detect it.
I’m not one to say, “Burn all the GPUs” — but I do think we have hit a threshold. This system is going to do a ton of useful and valuable work — some on day one out of the box, a lot more as people rearrange their own processes to take advantage of it. And the more time we can have before we jam the accelerator into GPT-5, I think the better off we all will be.
Some of that, it’s going to do on day one, out of the box a lot more. It’s going to do as people sort of rearrange their own processes to figure out how to take advantage of it.
Matthew Mittelsteadt: I think there are clearly some significant risks. The ability to query this thing for a recipe for sarin poison — we don’t want people to have access to those things.
I do think, though, that one thing which is largely missing in the GPT-4 report — and in a lot of the discourse — is the sense that most, if not the vast majority, of these risks probably can’t be solved through just training processes and the power of code and engineering. Eventually, these systems are going to have to hit reality. And in order to govern most of these risks, I think the onus of that is going to have to be placed on people and systems.
And a lot of these risks, once they hit reality, aren’t likely to manifest — because reality is just very complex. I mentioned sarin poison earlier. I don’t think these systems should be telling people how to make those things. But the complexity of actually launching an attack with sarin poison is actually quite high, because that poison is incredibly volatile. Deploying it properly requires incredible amounts of engineering precision and context: you have to have the right ventilation, you have to have the right scenario, people have to be clustered in such a way to have it actually make an impact.
Again, I don’t think we should be producing these recipes — but I think risks should be put into the context of reality, because reality is complex, and in a lot of cases these risks won’t be as risky.
This Is ChinaTalk…So Let’s Talk China
Jordan Schneider: I want to come back to the racing dynamic that Nathan alluded to.
On the one hand, it would be nice if everyone slowed down and, say, deployed 10% of all the economic changes that AI is going to bring before we brought on the next 90%.
But now we have hundreds of billions of dollars on the line — as well as the geopolitical dynamics. We just had the Party Congress, and lots of NPC delegates and the head of the Ministry of Science and Technology said, “AI is a strategic goal of China — to create awesome generalized models.”
And I don’t know how the racing dynamic can be pulled out of the system if there is a peer competitor with the US who, correctly in my view, sees AI as an incredibly strategic, critical technology and is doing everything it can to push the envelope.
What do we do with that?
Nathan Labenz: It sure would be great if we could have a better relationship with China. I think this is just one way — and it might be the most important way — in which a bad relationship with China is generally bad for everything. If one gets ahead of the other in AI, I think that is going to be a really hard knot to untie.
That might become the most important work in the world because — going back to my overall view on just the technology — I think this generation, GPT-4, is going to be awesome. It’s going to make a hugely positive impact. It will have some negative impacts, but I do think those negative impacts will be bounded.
But the race dynamic that could be shaping up between the West and China is indeed very worrying. Anything we can do to be more trusting of each other or cooperative as we usher in this new technology paradigm would be very good.
Jordan Schneider: It would be nice — but it takes two to tango, and I think my sense is there’s not really a dance partner on the other side. Once you internalize that reality, the calculus should change on a lot of these AI safety questions.
Nathan Labenz: One thing I will say: I have a modestly positive update on the race dynamics over the last couple months. I think the reason for the price drop from OpenAI is that they have lowered the price of inference so much. I’ve been advocating online a little bit for this hypothetical concept of “universal basic intelligence”: could we establish a standard by which everybody globally can have access to a certain intelligent assistance, on demand? Well, they’re getting so cheap now with the tokens that, in some ways, it’s approaching that. $2 per 1 million tokens is affordable to all but the very poorest people.
And the other end: I think they’ve closed the door behind them, when it comes to mega-scaling models, to all but ten to twenty entities globally — because it’s going to be hard to make a profit on inference. And when OpenAI already dominates the market, and they already have the known product, and they’re already integrated everywhere, and they’re already so cheap and reliable, and they don’t even keep your data anymore for training purposes — I think it’s going to be really hard for other commercial options to break through. So at least in the West, I think we’re going to see a pretty narrow field of contenders.
Jordan Schneider: But I think ByteDance, Baidu, Alibaba, and Tencent are also going to make it on that list of having the resources to play in this space. ERNIE is launching on March 16 — we’ll see whatever the hell that is.
And the other really interesting dynamic that you raised, Nathan:
If inference is so cheap, then access to the model becomes extraordinarily valuable.
If you have the weights, you can go really far in providing the capabilities that the mothership can — or maybe that’s wrong, but it seems that you can get pretty close with whatever comes out at the other end of all of the great work that the OpenAI engineers and Nathan with his red teaming do.
Matthew Mittelsteadt: So on that point — the hacking question: say there is a scenario in which we are so far ahead of China that China or Chinese companies don’t feel they can compete. I think we have to question whether or not they would want to steal and appropriate our models — because our models are going to be trained using primarily American data, or data among our allies, and that data is perhaps reflective of liberal democracy and cultural nuances that they might not want to spread.
Will they want to copy anything like that? Maybe the basic structure — but the actual weights?
Jordan Schneider: I’m kind of skeptical of that argumentation. Once we get to GPT-5 and GPT-6 — and this is the thing that you need to use to make your scientists smarter and to radically improve your economy — then at a certain point they’ll understand the cost-benefit of some college kids asking their pirated GPT-6 what happened in Tiananmen.
Coming back to questions for policymakers: in a nation-state context, what is key and defensible, or what means you’re in the lead or not in the lead? The sands are moving under your feet — it’s hard to come up with the five-point plan of what the G7 should do to stay ahead in AI.
Matthew Mittelsteadt: A lot of people are very concerned that, if we don’t stay ahead, it will lead to a world where things look a lot more like authoritarian China and less like liberal United States. One of the policy prescriptions that a lot of people are mulling and considering — and we’re seeing a manifestation of this in the CHIPS Act — is industrial policy: trying to use the centralized authority and resources of the United States government to bootstrap this process and ensure that we continue to lead in artificial intelligence.
One of the problems is that this stuff — as we’ve been learning every other day — is changing all the time. GPT-4 came out yesterday. A couple of months before, we saw ChatGPT, which on its own was a huge splash. Months before that, we saw Stable Diffusion, DALL·E, and all these other innovations. This stuff is just changing constantly — and the types of technologies involved in these conversations are wildly changing.
So I think this is a situation in which it’s very hard to see industrial policy working. Especially if you get into the nitty-gritty of industrial policy — I don’t know what technologies in five years are going to be at the heart of the best systems.
And so the idea that the United States government, the Commerce Department, and Congress today can forecast that all — with bills and funding and planning — is somewhat difficult to see. I think we would end up with a lot of wasted money.
I think the best approach is what we’re doing currently: just let industry lead the way.
We do seem to be the leaders in generative AI, and that didn’t take much industrial policy. To be sure, there may be specific niche areas where a more government-led approach could have some impact — obviously there are defense applications and other things like that. But in general, I think it all is so unpredictable, and any industrial policy at this stage just seems bound to fail.
What Applications GPT4 Will Unlock
Jordan Schneider: Let’s close with something you’re excited about to see built. Go around the horn.
Zvi Mowshowitz: I’m just super excited to see the ability of people to actually learn things and figure out information. Research is really valuable, but only a small core group of people do it. But when I think about my kids being able to just ask any question that they might ever plausibly want to ask, and have this thing be able to give them a really good answer — and once they learn how to do that, how much better is it going to be than going to a school, how many times faster can they learn, and how much better can this match their interests — that’s the thing that blows me away the most.
Matthew Mittelsteadt: In my opinion, applications that deal with physical health and healthcare are clearly areas that we’re going to see some of the most substantial improvement in terms of people’s lives.
Already, we’re seeing LLMs and various other similar models bootstrapping drug discovery processes. This new model seems to be able to recommend new combos of vitamins and drugs. And that’s very exciting — because our healthcare system is very blunt and doesn’t take in much nuance. There are only so many factors a doctor can consider when they spend just ten minutes of time with their patients. Having these technologies analyze a wider range of details to find the nuance in the symptoms people are describing to their doctors, and to tailor plans appropriate to those symptoms — it just sounds like a phenomenal ability.
I think if we unlock these healthcare abilities, that’s going to allow more people to engage with society; people won’t have as many maladies. So that’s what I’m definitely most excited about.
Nathan Labenz: We talk so much about — and people naturally worry so much about — misinformation and sowing discord. But one of the experiments that I ran in the red teaming was to cast the AI as a mediator between two neighbors that had a dispute over a fence. I found it to be quite effective in making people feel heard and helping people see one another’s side of a particular issue.
There are a lot of petty disputes out there — between people, between neighbors, even nation-states. I think sometimes they are pettier than they should be. And there may be real potential to use a system like GPT-4 to help us engage with each other more productively.
Finally, I’ll give one plug for something that OpenAI launched yesterday: their new Evals program. They are open-sourcing and inviting people to contribute evaluation tests for how the language model will behave in any number of situations. And I think it’s a nice touch that they have offered early API access to anyone who brings an evaluation test to them that they approve and merge into their broader library.
So I would definitely recommend checking that out. If you’re worried about AI safety — near-, middle-, or long-term — you can start to contribute to (hopefully) growing and robust set of LLM behavior standards that can start to govern what comes out in the future. The more people that can contribute, the better — and it may, in time, be one of the more important things that OpenAI launched yesterday.
So from here, our conversation continued with some very in-depth AI safety talk. We discuss topics like:
Training the LLaMA leak on GPT-3 answers
The Waluigi effect
Prospects for an AI winter