Computer Says Kill: The AI Safety Circus w/ Heidy Khlaaf

Transcript

This is an autogenerated transcript and may contain errors.

Hey there, I'm Alix Dunn, and this is our second to last episode of a series we've been running on AI and militarization called Computer Says Kill. In past episodes, we've covered everything from how the US military buys AI, how weapons have evolved in ways that have shaped war, and how AI companies are jostling to be the technology, the platform of modern war.

In this episode, we have Heidy Khlaaf, the chief scientist at AI Now Institute, to talk specifically about AI safety, or rather how AI companies' standards of safety aren't really about safety at all. Heidy is a globally recognized expert in the safety of critical systems. Think energy grids, military technology, nuclear power plants, basically the stuff that would cause huge harm if it fails.

And in this conversation, we explore how and why AI companies engage in the lingo of safety when what they mean has very little to do with the rigorous [00:01:00] standards and process that has been developed over the past century. So we're gonna start with Heidy kicking us off by giving a little bit of background on the field of safety engineering

My name is Heidy Khlaaf. I'm the chief AI scientist at the AI Now Institute. Safety means something as a word, um, and I feel like it comes up in lots of different contexts, and I feel like people may be surprised that safety comes up in a military context as well when thinking about systems, 'cause usually I think people associate militaries with trying to kill people, not necessarily, uh, do things with precision and kind of incorporate safety.

I'd love to hear a little bit on, like, how historically in military contexts safety has been defined, has been used, has shown up when militaries buy technologies in particular. Can you just give us a little bit of that context? Yeah, I think it surprises most people to find that the field of safety [00:02:00] engineering, right, which is the field that all of our safety-critical systems kind of abide by, was actually born out of the nuclear arms race in the defense sector, and it wasn't in contradiction to it.

It was defense and nuclear engineers that actually established this field, and historically what they define it as, especially in the context of critical systems and defense, is making sure that there's no accidental harm to humans, the environment, or even assets, like military assets, or at least, you know, minimizing harms in a way that the benefit of using that technology doesn't outweigh the risks and the cost.

So it also builds on this idea of, like, system failures, right? Like, no matter how small, they can compound and lead to catastrophic events that have to be mitigated. So what that means practically is having a safety calculation is based on the performance of these systems against very clear, quantifiable risks that consider how they fail, how that can cascade, and who they [00:03:00] harm.

So when people or AI companies especially make comparisons to the Cold War or the Manhattan Project, you know, they conveniently leave out, like, you know, how it was actually during this era that the scientists developing nuclear energy and nuclear weapons created Actually the very first risk analysis frameworks, right?

We think of risk analysis frameworks that has always existed. It's deployed in almost every field right now, but this is actually where it was born because it was meant to help them evaluate the risk and mitigations of deploying new and powerful technologies that also considered the public's interest.

And this is not out of the kindness of their hearts, right? This is not because, you know, the military just loves safety. I think it's actually strategic when you have these very rigorous frameworks that led to some of the most advanced military systems that gave advantage to the US in the past few decades.

And I think there's this idea that regulation hinders innovation, but in reality, it's these safety frameworks that led to development of [00:04:00] incredibly reliable and dependable systems which you need for a military advantage. So interesting. You use the term critical systems a lot. Do you mind defining what a critical system is?

'Cause I think that seems like a really important concept for people to understand when we talk more about safety and, and safety engineering. Yeah, absolutely. A safety critical system is any system whose failures could lead to death, environmental harm, or a damage of assets, right? And so this is not your typical cellphone.

This is not your computer. This is the energy grid, a nuclear power plant, an airplane, right? And obviously military systems as well, if they fail, if, you know, you have a F35 that crashes or if you have a tank that fails. It applies to those types of system where you have sort of life or death scenario sort of at hand as a consequence if these systems fail.

Okay. Super helpful. Okay, so I think most people would be surprised to learn that rigor around risk and safety was actually a core [00:05:00] component of what drove innovation in military development of new technologies over the last, let's say, 70 years. Do you wanna talk a little bit about how that works? Like, how does it work when a military is keen to get specific about safety?

How does that actually drive innovation and emerging technology development? Yeah, I think, you know, there is- multiple levels to this question. The first one is let's talk about, like, strategic and tactical military goals, right? If you have military systems that enable you to do that, you need them to be reliable.

If they're constantly failing or they're unreliable or they're not inaccurate, how do you know that you're actually meeting your goals, right? So that's kind of like a core concept of it. But they actually, during that time period, during sort of the nuclear arms race, there was a bigger question that's, like, sort of at the very heart of risk and safety frameworks, which is how safe is safe enough?

I think engineers at that time knew that that's not a question that can be answered by them as individuals who are in fact creating these [00:06:00] technologies because they can profoundly change our societies, especially if you're looking at something like a nuclear weapon, right? So I wanna give an example of Chauncey Starr, who's kind of one of the developers of risk assessment in the field.

He worked on the first atom bomb. From what he learned from that time, created the first ever risk analysis framework where he actually emphasized the need for societal evaluation of risk and interpreting public attitudes and values. So what that really means is that the safety of a system should be based on risk tolerances that are derived from what our civil society is willing to accept.

And post-Manhattan Project and during the Cold War, standards and regulatory measures were put in place to not only consider what the risk tolerance is based on democratic consensus, but that defense and other safety critical system also meet that minimum standard of safety. In the US, after the 1945 Trinity nuclear test, there was a huge public backlash against nuclear fallout and the danger from that.

And that actually led the [00:07:00] Atomic Energy Commission, that was kind of like a predecessor to the NRC today, which is a Nuclear Regulatory Commission, which is our nuclear regulator in the US. It sort of pushed them at that time to prioritize restoring the credibility of nuclear power and the public's risk perception of it.

The public was very clearly concerned about what the testing or even having something like a nuclear power plant meant for them. And so there was this really strong opposition that meant that political movements, and that in turn would mean picking specific, you know, politicians that enable their message and support them, right?

And I think it's very obvious then, okay, well, if they actually believe that nuclear weapons are important for the US' military advantage, and that something like nuclear power is also important for the US' industrial development, we actually believe in that, but the public doesn't see that. So let's take them on a journey to make it trustworthy so they actually believe in that same message as well, right?

It wasn't like this idea was like, "Well, you don't know what you're talking about." It's like, yeah, those risks are fair, and the public is going to continue to oppose it. And so [00:08:00] let's make these systems rigorous and safe in a way that makes them feel comfortable about their deployment. That's really what it's about, and I think that is how traditionally most technologies have been developed.

Obviously, the atom bomb wasn't a good example of that, but it wasn't like they developed the atom bomb and they were like, "Well, the public doesn't really have a say in this any further," right? We develop it, we're gonna deploy it, we're gonna test whatever we'd like. The backlash actually led them to believe that there was a lot at stake, and they needed, you know, the public's acceptance or trustworthiness at least, to be able to show them the benefits of this technology.

Yeah. It makes me think about that, um, famous New Yorker piece where the general public was being told, "It's all fine. These are, like, normal weapons." And then all of a sudden it was like, "Oh, no, these are, like, catastrophic technologies that cause so much harm." And then the public opinion changed incredibly quickly, but it was because it was all new.

Like, they didn't actually know what to think yet, and the kind of dust was settling on a lot of these opinions. So as, as these fields [00:09:00] change, some people do know more about these technologies than the general public or even other people in different fields. And so I imagine there's these people that emerge that are kind of making sense.

I mean, I think sense maker is actually a term that gets used in some of these spaces of, like, how do we interpret these new possibilities, and who are the stewards of that understanding that has a, I imagine, a sort of foundational role in what communication the public engages with and kind of how public opinion is processed and built.

Um, do you wanna talk a little bit about those people that know maybe more than other people and, like, that, how that role functions in, in this dynamic? Yeah, and it's not just-- This is not gonna be something that's just my opinion. Actually, when you look at the field of risk assessment itself, in its foundation, they do discuss, like, who gets to determine how safe something is, right?

Like, it is a political question, and I think that's recognized early on. So for example, again, going back to Chauncey Starr when he was developing the very first risk assessments and talking about how safe [00:10:00] safe enough is, he characterized the placement of technocratic experts in decision-making positions that sort of extend the role beyond technical assessment, technical engineering, as an indicator of an autocratic society.

I think that would be a surprise to many, given how much we sort of look up to technical expertise as a, a way to guide our sort of public policy, right? And this is because when you position technologists as decision-makers, you sort of eliminate the channels through which the public attitudes and the public values can be considered in how to sort of weigh the risks and benefits of new technologies.

So you essentially sort of launder that decision-making and risk determination to the very people who build these systems, which is a conflict of interest, right? Like, I think frankly, there are very good technologists, right, who do care about civil society and the public interest, but there's also people who are looking to profit from that.

And I think when we look at the characterization of this very [00:11:00] early on, when the sort of literature of risk assessments was being written, this is something that was considered, right? Like, how much should these technologists be involved in the decision-making process? And we must make sure that the public always has a say in that, regardless of what they say about their expertise or about what they believe public policy should be.

'Cause it also, in addition to creating a conflict of interest, it also feels like it collapses the domain expertise that's applied to these questions because you don't, you don't want just a technologist making choices about safety as discussed. Like, you want this much more dynamic political, social process around it that takes into account all kinds of things that a technologist is just totally in over their head.

Exactly. Like, I think there is this idea, there's this myth that because you know how, for example, an AI system works, right, then you must know everything about everything. Yeah. And, and that's completely false, right? Like- Yeah. And I see this every day because- I've met some of those guys. Yes. Yeah, exactly.

I mean, I spent four years assessing [00:12:00] nuclear plants, right? The software in them, including the implementation of AI in them and sort of safety mechanisms, and I have a specific domain expertise there. And to now see companies, AI companies selling LLMs as a way to bypass nuclear safety and not be automated by LLMs.

I'm like, and I look at their demos, I go, "This is someone who actually is not familiar with the nuclear process," right? Is not familiar with what's at stake, what's required, the engineering discipline, the engineering expertise. And it's very clear to me that they have that complete disconnect, right? From the demos that they show, I'm like, "This prompt you just asked, that would- that question would never come up if you're writing a safety case for a nuclear plant," right?

And so that's just from my own experience. That experience I think we've seen across many fields being told that these AI experts saying AI's gonna solve cancer. We're not gonna have radiologists. We're gonna get rid of software engineers. We're gonna get rid of this. We're gonna get rid of that. When often these fields, they actually don't understand the [00:13:00] complexities of them and the expertise that's required to actually commit those tasks.

It becomes something like very abstract to them that they think they can do away with. And it's this lust for being the dominant domain. It's like this, it's this passion for the idea that a singular set of skills could basically rewire the entire world in a way that is so naive and so also like- It's reductive Yeah, it's so reductive It's very reductive, yeah It, like, the very idea of it undermines the claim they're making.

It's like- Exactly ... uh, which I find really, I find it regularly ironic how much they overstep and, like, tell on themselves that, like, they think that this expertise alone can solve some of the most complex questions, um, in our worlds. I have a theory that it's because they don't like working with other people.

Um, uh- There's a- I, I, I think it's like, I actually think it goes to this idea of this generalization that, you know, LLMs have been described as, of, [00:14:00] like, a one model that can rule them all. I actually, when I worked in nuclear, I worked on AI systems, right? Like, I worked on very purpose-built AI systems that could help, like, nuclear operators, for example, right?

This wasn't a type of AI that could displace people and do their jobs, because that was never the claim. It was like, you know, I'll give you a very concrete example where AI is super useful in nuclear. For example, if you have, like, a radiation leak, potentially you can deploy some sort of AI-based sensor or some sort of robotics to explore that leak, essentially, to minimize human harm, right?

That's very, very different from this idea of, like, one LLM for all the possible tasks in the world, right? That is, like, when you have, like, a very reductive way of viewing how these defense systems work, how a nuclear plant works. And I think, you know, we're seeing that more and more in, in the military.

We're moving away from these purpose-built models that have kind of domina- And we're actually still seeing that, and we're seeing it in, in Ukraine and Russia, right? Their drone development is actually very [00:15:00] purpose-built for the situation that- Mm-hmm ... you know, um, you know- And very effective. Yeah, yeah. And, and that's really different from let's just feed an LLM a bunch of data and see what it says, and that becomes our, sort of our military operation.

Let's dig in a little bit more on your experience working in nuclear, and then I wanna talk about how you feel about what's happening now. So can you give us an example of how a well-trained safety engineer might approach a safety problem, maybe drawing on experiences where you've been tasked with testing a system that has, you know, big implications like a nuclear power plant or something?

Yeah. Typically, you start out with what we call sort of a safety claim, right? And this is a claim about if the system is, like, safe enough, right? Or if it meets its functionality for a specific use case. And from that high-level claim, you audit these types of systems and investigate if they meet them in, in different capacities.

And so that actually means having access [00:16:00] to those systems. And a lot of times as a safety engineer, you have an independence that other engineers do not, right? Either, you know, you're working for a specific regulator to assess a specific system, or, you know, you are sort of being hired by a third party to do that as well because they're looking for, like, a specific certification.

And that independence is really quite important because, you know, you know nothing about the project and you sort of onboard, and you sort of get access to the code base. The key things here that I found to be very important are the independence of it, right? And we had nothing to do with those companies, right?

We came in, we had our own way of assessing them. And the second thing is having that access or being able to verify their claims, because from the get-go on the very first day, typically these companies tell you, "We meet this specific standard. We followed it to a T. We did everything. Here's our documentation.

You can see that our entire development process revolves around abiding by this specific standard that's used [00:17:00] for nuclear." And you go, "Great. Can I see it?" Right? "Can I see your test? Can I see your code?" Prove it. Exactly. It also feels like two other parts of that that are important is specificity. So, like, being required to make a specific claim that you then have to prove is so different, I think, than what we're seeing now, and that feels extremely important to be able to validate because if you don't have a specific claim, there's no-- there's nothing to test with LLMs.

It's this very, like, soupy, benchmark-y, did it pass an LSAT kind of thing. And then the, the second piece I would add too is that, like, the fact that these are non-deterministic systems now, um, that feels like a completely different set of questions, 'cause if you get it right in a test that you run as a safety engineer, who's to say that the very next time you run the same thing that it's gonna get it wrong?

Because the systems aren't like-- like, you can't lock in A way of being, which feels like a, like how do you even.. I don't know. If you were to have access to LLMs in the way that [00:18:00] you're describing, and have that level of independence, and have that level of specificity, is there a way that you think that safety engineers should approach LLMs?

Or do you just feel like it's, like, not possible because of the architecture of them? I mean, I think you touched on a very good point about safety claims, which is that they're general, and there's no such thing as general safety. I think that's one of the sort of biggest misnomers we've seen on the reframing of safety that comes out of AI labs, is that there is this idea of safety.

Safety is relative. What safety means for a nuclear plant is very different from aviation. I'll give you an example, because we were talking about risk thresholds earlier. A catastrophic incident for a commercial flight is 365 dead passengers, right? Because that's how typically the number of passengers on a commercial plane.

365 is not considered catastrophic for a nuclear plant. It's much more than that, right? Given sort of- Wow ... the power of nuclear radiation. Yeah. And so it's relative. Mm-hmm. It is completely relative to the field, to the technology. And so when we have benchmarkings, to me, that's meaningless, [00:19:00] because what is this benchmark for?

Is it for a targeting system? Is it for, uh, nuclear command and control? Is it for a fighter jet? Even when we're looking at it from the military perspective, safety means different things across different systems, right? So I think that's the first thing. And as you said, you need to be able to have specific claims to be able to substantiate them.

And the way that a lot of AI safety frameworks work is that they revolve around hypothetical risks that actually you can't scientifically verify them, and they can't be substantiated because of how abstract they are. And I think we've seen this across two points. The first is this idea of safety which surrounds this whole concept that AI systems will be able to develop chemical, biological, radiological, and nuclear weapons, right?

Or they call it CBRN. And AI companies have been constructed this unsupported narrative for years that AI safety should surround this idea that models shouldn't get powerful enough to develop CBRN capabilities, and AI companies are [00:20:00] always on the cusp of that. Like, they're always like- Yeah ... warning us.

We're so close. They're edge warning us with bio, yeah. Yeah. Yeah. Exactly. And, you know, but we have yet to see any of those capabilities beside some blog post releases. The second concern is this sort of winning AI arms race, right, which is this vague idea that the accelerated adoption of AI above all else is a marker of the US's technological advantage and defense prowess, especially against adversaries like China.

That becomes their safety claims, right? And to me, that completely skews risk tolerances and thresholds that we initially had for military and safety critical systems, and moves them away from concrete harms, you know, like I talked about human lives, assets, environment, and, and kinda moves it towards these abstract existential risks that can be used to really justify any harmful deployment of AI, right?

Like, if you have a, a claim that says the AI will be capable of deploying a nuclear weapon or creating a nuclear weapon, how do you [00:21:00] even scientifically break that down, right? And they never actually show you the process. They say, "Oh, we, we had some nuclear engineers come, and they asked it some scary questions."

It's like, well, this is, this is not indicative of the model performing in a way that demonstrates that. And at the end of the day, we have to probe these models to give us some sort of output, right? It's a lot of disconnect from us taking a system saying, "Okay, well, if it fails-" This happens, and then this happens, and then this happens, and then X amount of people will be harmed, right?

Like in the case of a, a commercial flight, right? If it crashes, you know exactly the consequences. And, and I think that's what's so difficult about this conversation is the claims that, that they're making, they're not scientifically verifiable. No, even if you, you throw like safety methods at them, like safety cases, even if you audit those systems, those are simply just not very scientific concepts, right?

Or when they talk about a super intelligence, well, what does that mean? What are the implications? How does that happen? And, and often when you, I think you read the narratives, they often [00:22:00] sound like all hypothetical situations that we have yet to see, or they kind of build a very specific test rig where they try to kind of role play that, if that makes sense, right?

And I think that is very far from, from how any sort of auditing of systems pan out. And so I think the, to be able to audit an LLM, you made a very good point that they're non-deterministic, right? Like, this is a really big problem with them. You basically have to safeguard them in a way that they're not directly making life or death decisions.

They're so inaccurate, they're very far from any reli- meeting any reliability standards for safety critical systems. And the second thing is, I think we'd have to move away from this idea of general safety and very much focus on a test case by case scenario, and we're not seeing that. Every day we see a new benchmark, right?

Every day we see a new like, "Oh, this is a benchmark to rule them all. No, this is a benchmark to rule them all." There's no such thing, and that, I think that worries me 'cause I feel like we are moving [00:23:00] away from the verification and validation frameworks that have been built in the last several decades for the military, for safety critical systems that really only deploy things once we understand they're fit for purpose.

We don't have that anymore. We say, "This generic benchmark, if you pass it, it's all good." And it's like, that didn't prove anything about whether this military system has high enough accuracy to be able to be used for targeting, for example. And I think that to me it feels like we're losing the actual safety thresholds and frameworks that guided our systems in being accurate and reliable for something that's like much more brittle and ad hoc.

Yeah. It also just feels like such an unserious enterprise that's co-opting language of a very serious historical process that has proven society quite well. So basically Attempting to evaluate the safety of LLMs has ended up in this zone of increasing abstraction, increasing [00:24:00] general benchmarks, general tests that move away from what you're describing as the value of safety evaluation, and also, like, getting more and more speculative in nature, which feels also important and connected.

I feel like it must be deeply frustrating to have a field that you're this expert in be engaged with in this way. Um, do you wanna describe sort of how you see the use of safety engineering language Frameworks, like kind of performative engagement, and, like, what is the state of play in this AI era in safety-critical systems, um, and how is safety engineering, how is the field evolving in this moment?

Yeah, so I think that AI labs engage in what I call safety revisionism, which is where they use the same safety terminology that has been historically used in regulating and assuring defense and safety-critical systems. But instead, they redefine those safety techniques with [00:25:00] unrelated or washed-down alternatives that actually seek to accelerate the deployment of inaccurate and unreliable AI in high-risk scenarios like defense or nuclear.

So instead of safety being about humans, the environment, financial harms even, companies now use that term to often mean things like alignment or existential risks. But if we take a step back and look at the term alignment, it focuses on human preferences, right? Like, we want the AI to be aligned with us.

Well, I think that should make us question, well, alignment with who? And the existential risks that are also emphasized, like CBRN, are again hypothetical, but are used sort of as a pretense of an AI arms race to ignore the very risks and safety thresholds that have been established previously by militaries.

So when we allow AI companies to do this, right, to kind of revise safety language, it actually puts them in a position to define what the risk thresholds are or what safe enough actually means. And the entire idea [00:26:00] of risk threshold is to give a measure of level, the risk exposure that our society is collectively willing to take, which then shapes how we determine safety of its technological systems and how we assess them.

But instead, because the AI companies have sort of co-opted these traditional safety terms, we've given them the permission to not only decide what counts as safe enough, but to also lower and undermine the existing safety thresholds that we would've otherwise used to regulate AI in defense. And I do think that if we take AI systems as they are, and we try to run them by, like, an existing military standard for, like, software systems, for example, I think a lot of them would fail because often we're looking at reliability rates of, like, 90% in nuclear, looking at, like, 99.99%.

And then you just look at how these systems operate, you're looking at 30 to 50% accuracy rates here, right? We are so far From the rigor that we thought previously had. I know, it's like flipping a coin, right? Like, that's what I- how I see it. Like, [00:27:00] how is this different from flipping a coin in terms of getting a decision out of the AI system?

So in terms of sort of like what we mean by safety, I also wanna again go back to Chauncey Starr. I think it's very obvious he's clearly one of my favorite safety engineers, is that when he was developing the risk assessment, he actually pointed out that autocratic powers in military organizations tend to over inde- index on these types of sort of risks that AI companies often emphasize, uh, for the purpose of preserving sort of the welfare of these regimes, as he calls it, right?

Again, this goes back to the point that there's a clear conflict of interest if we let AI companies determine what safety means when this is ultimately a democratic exercise as well. Um, so- They're grading their own homework- Yeah, exactly ... and, like, deciding who graduates, and then building schools where they're, like, giving other people grades, and it's like it's this...

It feels entirely insular- Yeah ... and conflicted. Yeah, [00:28:00] and I actually think ironically, this sort of acceleration of AI adoption in the military at the cost of the lowered safety and security threshold we previously had may actually be what disadvantages the US military after all, because you're now throwing away decades' worth of safety engineering that led to our military systems being reliable and robust for the sake of having AI systems which are inaccurate, unreliable, and very brittle.

So it's not really the best way to achieve strategic and tactical military goals if you don't know if your AI is accurate, if you don't know if you've hit the target you're meant to hit, right, if you don't even know it's a valid target. So it's not just about the safety... I mean, like I said, when defense and nuclear engineers develop safety engineering as a field, it's not just out of the kindness of their own heart of just, like, public interest, right?

It's also the fact that if you have systems that fail often, you're not gonna be able to meet whatever goal that you had set out, right? It's as simple as that, right? I wanna give an example of [00:29:00] a, the Air Force. They had a, a targeting AI system they initially thought was gonna be 90%, you know, accurate, and it was only 25% accurate in practice.

Like, is that something you want to be deploying in the world, right? No. No, the answer's no. Exactly. Exactly. So this is not, like, this is not just about these systems failing and harming people. Yeah. This is also about the fact that you can't use them towards whatever purpose you need them to fulfill, and I think that's, that's very difficult.

You know, as a person who is a safety engineer, it's very hard to see people just ignoring that fact. It's like, are they fit for purpose? No. Are they reliable? No. Are they dependable? Why is there a significant push- To implement these systems where they're going to lead to incredible amounts of failures.

I think, you know, training the safety conversation is a very convenient way for AI companies to be able to have their systems implemented in the massive military industrial complex that exists in the US, right? Without having to meet those standards. It's a very convenient thing for them to [00:30:00] instead of, instead of thinking how much money do we have to spend to make our systems reliable, is that even possible with the way that deep neural networks operate?

Instead of answering that question, it's like, well, we just, how about we reframe everything? We reframe safety to be about something else that, like you said, we can grade our own homework. Because I think it's very clear these systems don't meet the same reliability thresholds we've previously had for other types of software systems or even other types of AI.

And I think it's a, a conflict of interest, right? We should be very wary of the claims that these companies are making that helps their bottom line, right? They have their own incentives, and those incentives are very different from what the public needs. And it's not just about AI companies themselves, I just wanna say that.

Because we previously had independent verification of these, of any types of software systems across all safety critical systems. That was the standard, and that's kind of being done away with, with now this new, you know, as we were talking about, these new ways of [00:31:00] doing benchmarking. And so I think it's our policymakers themselves that also need to be held accountable, right?

Why are we eliminating this sort of independent evaluation, right? We're seeing it in the military, we're seeing it in nuclear, we're seeing across many other fields, in healthcare, for example. It's up to our governing institutions to combat that and to look out for what's actually in the best interest of the public, and I think we have seen a collapse of that.

Again, I think that collapse has kinda come because of the reframing of safety, right? But it is something that I think we need to consider as well, because I do feel like if we had the same institutions we did 30 years ago, I think a lot of people would see, well, this AI system's inaccurate. Why would I put it in my system?

And that used to be the case. Like, there used to actually be a lot of discussion, "Well, I'm not gonna put this vision system," right? This is even before LLMs, because they have all these faults, so we need to be really careful about its deployment. Well, wait. Now it's kind of like, doesn't matter. This is what I don't, this is what I don't under- what, there is no first mover advantage with an experimental technology.

Like, w- what is the [00:32:00] advantage of going fast in this when the Systems are so unreli- I just don't, it's like and this is where I feel like we're kind of, it connects to this bigger social process where we're kind of running on the fumes of knowledge produced in the past, um, and under funding and undermining knowledge production and serious science right now in this way that I think, like you can't, we don't get nice things if we don't have serious science.

And basically, like this is performative science, but it's not serious science, and it just feels like, again, this is just like not gonna end well. Like I just don't- Yeah ... see where this goes. I actually will challenge a few things in the statement that you said- Yeah, great ... because I think some safety engineers, and that includes myself, we don't see AI as a new technology, right?

Okay. Yeah, fair. In fact, we don't necessarily see it as something akin to a nuclear bomb, which I think is a comparison we often see. Yeah. I actually view it as kind of a- Second it. Yeah. I actually view it as kind of more like an automation [00:33:00] function of already existing capabilities that we have. Mm. You sacrifice accuracy for speed, and I think that is how I sort of perceive it.

Like when we are looking at safety critical systems Do we need to sacrifice? Is there any sort of use case where we can sacrifice accuracy for speed? There are very specific example I think where AI can help, but sort of to have it become kind of the algorithm that overrules our entire society and in all of our systems is not the best way to view it because it, like I said, there's a cost of accuracy here in replacing specific processes or labor or sys- other sys- like traditional software systems with something that's highly inaccurate just so we have some sort of speed.

And I do think this is another type of like myth that has been introduced because safety was never about speed. It's a well-known, rigorous, slow process, but it's often good for us, right? It's often good for us to be able to rely on the systems that we have and feel safe in their [00:34:00] presence, right? And so I think like, I actually would question the narrative that AI is anything like the nuclear bomb, right?

And there's an advantage in rushing and implementing it 'cause that hasn't been substantiated. We actually haven't seen the military advantages of having AI for targeting, for example, right? I don't think we can look at specific conflicts, whether you're talking about, you know, Gaza, whether you're talking about Iran, where we see that AI has been used for targeting and say, "Well, that went well."

Right? I think we can talk about the casualties. We can talk about, you know, the case of the 170 dead schoolgirls in Iran. We can talk about civilian infrastructure. This is why I question the arms race narrative. It's like, does it actually give us a military advantage? Well, if it's inaccurate, I don't actually think that it, it does, right?

I think that, I think you can do a lot with AI, too, in terms of brute force, but the arms race and new technology that we have to accept AI adoption above all else is not something that I think has been substantiated, if that makes sense. Definitely makes sense. It feels to [00:35:00] me, generative AI to me feels like a corporate DDoS on society in a way that I like really, like it's really disorienting.

How do you combat the narratives and the corrupt relationships and the horrible decision-making and the, um, like it's all blended in this, like, horrible vortex of unaccountable power. 'Cause if you make the case, well, we want it to be quality, I don't know, like what is quality when you have... Like I- if, if the Trump administration could illegally attack Iran and be more precise, is that better?

I don't know. Like Israel made the claim of certain accuracy thresholds of certain AI systems they were using, and that's predicated on the idea that they would actually have a feedback loop of ground data that they could confirm whether or not what they had intended to do actually happened, and like who they had intended to kill they had actually killed.

But they killed so many people that the idea that like that was some type of accurate mathematical process of thresholds of probability and [00:36:00] precision, and that they were this very mature military that's like rolling out these... It's just absurd on its face. And I don't know like how to engage in serious conversation about accuracy and quality control in military contexts while also grappling with the fact that these technologies are being used by just like horrific war criminals.

Um, and like how that, I don't know, how do you, how do you tease that apart? Like how do you take your science brain and say like, "This is accurate or not accurate," and then these systems are deployed in these ways. And obviously the two are connected, but like you're trying to tease out a more objective frame?

Like, how do you do that? Yeah. Well, like any other safety engineering, you ask, "Well, what is accuracy? What is it? What did you mean by that?" I'll give you a good example of Israel, one of their sort of verification algorithms on the ground. The Washington Post actually covered this, where it shows that once they sort of determined a military target, whether through AI or manually, they had these sort of very naive algorithms that [00:37:00] in order to identify how many civilians were in a building, they looked at how many people connected to a specific cellular network that were closest to that area.

Okay, well, you are in war. One, there's children, right? Children do not, especially in a place like Gaza, are not going to have smartphones, right? Or even phones connected to a cellular network. Two, at that point in time, you already had enormous amounts of damage, right? Where the populations who typically lived in that building now have completely changed.

Some people are seeking refuge there. Some people have lost everything, so no, they don't have a phone that's connected to a cellular network, right? This is a really naive way- Because they don't have power. Yes, exactly. They don't have... There's, like, so many... Yeah. Yeah. Exactly. There's so many things where it's like, if that is your way of determining of how many casualties there were in targeting a specific building, it's already flawed.

So that's the thing about when you're a safety engineer that you always ask, like, "What did you mean by that? Define it. Give me, give me the exact examples of what you meant by safety, what you meant by accuracy, what you mean by dependability. What was your verification algorithm?" [00:38:00] And it's actually very difficult to even get explainability out of these algorithms, right?

And so because of the inherent nature of the black box and the scale of something, especially something like large language models, you're then unable to sort of verify that claim even further, right? And so, you know, I give you an example of an investigation that a reporter was able to ask folks from the IDF, right, what, what the specific algorithm was.

And then in the end, it was something naive and not reflective of what's actually happening in warfare, and I think that's AI in a nutshell, right? Like, I mean, if, if we are talking about even accuracy metrics prior to a specific conflict, like let's bring up Gaza again Before and, and after October 7th, the population trends are completely different, right?

If you have built an AI algorithm based on the way that the population is behaving prior to October 7th, that's gonna be very, very different from the way that the population is behaving post being bombarded. And so I do think this idea of [00:39:00] accuracy is questionable. Like, I do think that the metric that even, uh, the metric of accuracy itself that's used in AI is not reflective of accuracy in the real world, right?

It's based on this idea of, uh, test sets and, and training sets, right? But when you're looking at, you know, military application, you have the fog of war, right? Where you're coming across novel situations every second, and scenarios that the AI could have never seen before, and we know that AI is unable to generalize beyond what it's been trained on.

And so I think, like, we should be very wary of the metrics. This goes back again to the safety washing concept. We should be very wary of the metrics that AI companies use or developers of AI algorithms use, because they're often, there's a disconnect from that and, like, the reality of what's on the ground and what happens or how systems fail even, right?

And the more you investigate, the more the-- you always find an inconsistent story. Like another example when we're talking about casualties, right? The IDF defined the parameters of how many casualties are, are accepted. [00:40:00] So in some scenarios it was up to 300 or 350 casualties, which is enormous. And then, you know, it's very easy to say, "Well, we decided 350 was an acceptable number of casualties, so that makes our algorithm 80% accurate."

But in context, I don't think people would consider that very accurate, right? I don't think people would consider that accurate or precise if you're willing to sacrifice 350 individuals for a potential target that you don't even know is there. If these concepts were rightly returned to the domain of experts, um, and actually the dog was wagging the tail instead of the tail wagging the dog here.

What could that look like? Like, what, what would that look like, and what do you think it would take to kind of recenter expertise, knowledge, rigor in understanding when and how we should incorporate emerging technology into such important parts of society? So I think there's two aspects to this question.

There's existing fields like safety critical systems, and then there's also the new risks that [00:41:00] are posed by AI, for example, like mental health en- engagement with, like, chatbots, right? So I wanna talk about the former first, because I think there's constant, you know, every day we're seeing a new AI framework, and I'm like, "We don't need it."

Like, I, I, it's a very controversial take- We don't need more white papers, Heidi? It's a very controversial take because I'm like, we have determined the safety and risk thresholds of any type of system, whether it's hardware or software or AI, we determined that, like, decades ago for aviation, for military, for, for nuclear, and we don't actually need to come up with new thresholds for AI.

I don't, I don't actually-- I think this is part of, like, the safety revisionism conversation. It's like why should a software system have to be 99.99%? reliable, but an AI system can be 30%. Like, that is a, a, a, that is actually very, very strange. And I think that a lot of our existing standards, a lot of our existing, you know, guidelines from [00:42:00] safety critical systems can remain as is.

We just need to actually implement them, and I think we have been moving away from that. I think we have, like I said, we are seeing the independent agency of regulators be pulled. We are seeing these groups who are meant to independently assess these systems be sidelined, and we're not just seeing that in the US, we're seeing it across, like, all governments, whether it's Europe and the UK, right?

So I think we need to reverse back to that, and luckily we have that to fall back on. I think my concern is sort of that latter group I was talking about, the new risks of AI, right? When we're talking about, like, mental health, when we're talking about their uses in new areas of our life that technology wasn't in before, I think there does need to be a reckoning with how safe safe enough is that the public needs to be at the center of, right?

Like, how many deaths is acceptable? And it sounds like a very cold, like I said, it sounds very cold, but it is how safety engineers think. What is a catastrophic risk? How many dead people is good enough for this technology to be deployed? And that cannot be-- I know, it's, it's hard. It sounds-- [00:43:00] But that is, that is a calculus, right?

Sure. Yeah Like, that is how you determine how safe a system is. Yeah. And someone needs to decide how many dead people is too much dead people. Yeah. And it's, it's awful, but like- And it shouldn't be Sam Altman casually saying- Exactly ... that rolling out chatbots to teens is gonna lead to kids dying by suicide- Exactly

and like, "I guess we'll figure it out." It should be the public, the pub- Yeah ... if the public is-- Like I said, that's what a risk tolerance is, is, it's sort of a, a society's willingness to accept a, a risk for- How do you- ... a specific benefit. How do you do that? Like, how, how, like, i- is it, um, going back to the example of nuclear power and there's public opinions and attitudes that get sort of fed- Yeah

into, like is that, is that... Or are there- That's democratic consensus, right? Like, it's us electing officials who are able to represent specific points of view, specific communities who are being harmed, right? And coming to sort of an agreement in that. It is not an easy process, by the way, in the slightest.

I think to get to the idea of, you know, I talk about nuclear thresholds being well established, but that took decades to be able to determine, and it was a back and forth conversation, right? Between the people [00:44:00] developing it, between sort of thinking about national security, and also thinking about the people being harmed at when you're developing these types of systems.

So I do think that with something like chatbots, we need the public to be at the center of that conversation, and that, for me, is something we're not seeing. We're seeing all this, like, benchmarks. Here's this, another safety benchmark that is based on the AI company's idea of safety. But we do need to get to this, really, like I said, it's a tough question, which is- How many people should be harmed by the technologies, uh, for us to, you know, what are we willing to accept in terms of, like, those harms for us to be able to reap the benefits of it?

And it's, and I think that's a hard question. I'm not gonna say that I have an answer to it, but we need to have that conversation, and we're not having it. Well, 'cause we're so distracted by all this vague promise of something good that's worth anything. Yeah, yeah. It goes back to the safety revisionism, right?

Yeah. Instead of us being like, "What [00:45:00] is safety," right? It's human harm, death, destruction to the environment, and destruction to even physical, like, assets or financial assets. Instead, the conversation is like, "I think my AI is going to build a nuclear weapon." And I think that's why the public has a hard time engaging, right?

'Cause they're like, "I'm so disconnected from that. Like, this is not where I feel like I'm being impacted. This seems like a military thing," right? Um, "It seems like a national security thing. It doesn't have to do with my life." But actually, we're already seeing the harms to everyday people. It's just that because the safety conversation has not been framed that way, it's has, it has made it very difficult for them to engage of how safe something really should be.

It's like Abeba Birhane's line, I think this was her, um, that actual harms are being distracted from by possible benefits. Exactly, yeah. Yeah. And I would say, I would question what the benefits are. Yeah, yeah. We actually haven't seen a quantification of it. Yeah. I know from purpose-built AI, right? Like, I'm not gonna talk about large language models- Yeah

or generative AI. I know where they have strength, right, in terms of science, in terms of [00:46:00] safety. But in terms of large language models and generative a- AI, we have yet to see the very specific studies that substantiate a lot of the claims about their benefits, and I think that is part of the conversation as well.

For the public to be also be able to engage in that, they, that information is really necessary. And I think this is, you know, I talk about how transparency isn't always important for, like, national security and safety critical systems, but this is where transparency is important for the public to be able to engage in that conversation.

Thank you to Heidy for all the work she does, and thanks for listening to this episode. Next week, we're gonna turn to the biggest question I've had when learning from all of these guests over the past few weeks: What can we actually do to change our current heading? Next week, we're bringing back our first guest, Matt Mahmoudi, alongside Marwa Fatafta of Access Now to discuss upcoming work to change norms and hopefully trajectories of these technologies.

They have been cooking up something I'm hoping will help, and we will dig into that with them next week. We are also gonna be [00:47:00] hosting space in the Maybe Collective, which is the network space that we host various conversations on technology politics, and we're gonna be talking more about initiatives and ways to get involved.

So if you wanna be in the loop on those conversations, you can subscribe for free to the Maybe Collective, and that is linked in the show notes below. As usual, I wanna thank our production team who worked to put the series together: Sarah Myles, Georgia Iacovou, Kushal Dev, Marion Wellington, Van Newman, and Zoe Trout.

Thanks so much for listening, and we'll see you next week.

Computer Says Kill: The AI Safety Circus w/ Heidy Khlaaf

Show Notes

Hosts

Release Date

Episode Number

Transcript

Stay up to speed on tech politics