Introduction

Hello, and welcome back to the Cognitive Revolution!

Today, we're continuing our short series on creative AI governance proposals with Gabriel Weil, Assistant Professor of Law at Turo University and Senior Fellow at the Institute for Law and AI, who argues that liability law may be our best tool for shaping the decisions that AI developers make.

As we covered in our last episode, on private regulatory markets, the pace of AI capabilities advances and adoption, the radical uncertainty around the timing, nature, and impact of AGI and Superintelligence, and the backdrop of international competition present a singularly difficult challenge for governments.

For good reason, they worry that heavy-handed regulation could undermine our ability to realize the great upside of AI, while at the same time it's becoming clearer and clearer, one MechaHitler episode at a time, that we can't simply trust companies to do the right thing for society while they're primarily focused on one-upping one another.

So… is there any way to govern AI that can keep up with technology developments, meaningfully reduce the most important risks, and still keep the dream of curing all diseases alive?

Professor Weil brings another compelling idea to the table: rather than trying to predict issues and prescribe safety standards from a distance, why not use liability law to incentivize AI developers to properly consider and account for the risk that their development and deployment decisions create for the rest of society?

Because I am not a lawyer, and know most of you aren't either, we begin this conversation with a primer on liability law – covering negligence, products liability, and the doctrine of "abnormally dangerous activities" – before diving into how these frameworks might apply to frontier AI development.

The key advantages to using liability law in this way are that the liability risk a company faces naturally scales with the risks it takes. If the systems are safe, there's nothing to worry about.

And – unlike most other proposals, which would require new legislation, liability law is well-established and has proven over centuries of evolution that it can adapt to new situations and technologies.

Nevertheless, important questions arise around the different types of harms that AI systems can cause, and the mechanism by which they come about.

Throughout the conversation, we explore concrete scenarios that highlight the complexities ahead, including the tragic Character AI case, phone call agents that can call unsuspecting people and speak to them with increasingly lifelike cloned voices, amd coding agents that might overwhelm APIs or outright hack systems – considering in each case, how responsibility should be shared by the model creators – both closed and open source – as well as the application developers, and end users.

Notably, Professor Weil does want to make sure that society realizes the great benefits of AI, even as it remains imperfect, and so he's less focused on changing how AI companies serve customers, with products like AI doctors or self-driving cars,

and instead, emphasizes the risk of harms to 3rd parties who were not part of the commercial relationship between the AI companies and their customers, such as the pedestrians sharing space with self-driving cars, or the public as a whole, which it seems will face at least some increased risk of pandemic and other systemic harms.

Within this category, he treats misuse – where a person is intentionally trying to use an AI system to cause harm – distinctly from misalignment, where the AI system itself breaks bad, for whatever reason.

His most provocative proposal involves punitive damages. If an AI system causes a relatively small harm, but evidence shows that the situation could easily have gone much worse than it did, Professor Weil argues that punitive damages offer a way to hold companies accountable not just for the actual harm, but for the risk they irresponsibly ran. Considering the magnitude of harms that people worry about when it comes to bio- and cyber-security, such a judgment could in theory be existential even for the most powerful and deep-pocketed companies, and as such this does seem like a promising way to get companies to properly internalize such large-scale risks.

Beyond that, we discuss the role of the insurance industry in making this work, what other policies would complement this evolution of liability law, and even touch on Professor Weil's hands-on work crafting state-level legislation in Rhode Island and New York, which would make clear that if an AI system does something that would be a tort if a human did it, and neither the user nor any other intermediary intended or could have reasonably anticipated that outcome, then the developer should be strictly liable.

It's a simple and I think relatively unobjectionable idea to address model-level misalignment that at least some governments might find a natural first step toward accountability for frontier AI companies.

As I said last time, all governance proposals require people to do a good job, and no governance structure can guarantee success. Where the private regulatory market proposal trusts governments to articulate worthy goals and private regulatory bodies to effectively implement them, this liability-based approach would rely on judges and juries to make good reasonable decisions, and on companies to adjust their decision making based on that expectation.

Honestly, both of these proposals seem like major improvements relative to traditional top-down rule-making – or to doing nothing – but I honestly can't say I have a favorite. Perhaps the best thing to do is pursue both in parallel, in different jurisdictions, and see which one seems to be working better when the time comes for implementation at a larger scale.

If you have a strong opinion on this, or if there are other proposals you think would be better than either of these - please do reach out and let me know!

For now, I hope you enjoy this exploration of how centuries-old legal principles might help us navigate the emerging risks of artificial intelligence, with Professor Gabriel Weil.

Main Episode

Nathan Labenz: Gabriel Weil, Assistant Professor of Law at Touro University and Senior Fellow at the Institute for Law and AI, welcome to The Cognitive Revolution.

Gabriel Weil: Great to be here. Thanks for having me.

Nathan Labenz: I'm excited for the conversation. We met for the first time at The Curve late last year, and credit to the organizers, that event has yielded a number of interesting connections and now episodes for me. At the time, we had a really fascinating conversation about an idea I had not encountered before: using liability law to help society manage some of the emergent and extreme risks from AI. I'm excited to unpack that. For starters, because we have many AI engineers in the audience building with AI and plugged into the AI scene, but probably less grounded in law generally, and liability law specifically, perhaps you could give us a quick, to the degree this is possible, Liability 101. Set the stage for where we are, and then we can unpack what you propose as we go forward.

Gabriel Weil: Certainly. There are two forms of liability that are clearly applicable, at least to AI systems in some contexts. Negligence is broadly applicable. How negligence works is the plaintiff must prove five elements: that the defendant had a duty of care, that they breached that duty of care, that they failed to exercise reasonable care, and that this caused an injury that was both a factual and proximate cause. The injury must be an actual harm, like a physical injury, not something purely emotional. In the AI context, a plaintiff will have to show that there's some best practice, an alignment or safety technique that a reasonable person would have implemented, which the AI company failed to do. And that had they implemented it, it would have prevented the plaintiff's injury, creating a breach causation nexus. This is not part of the black letter doctrine, but in practice, the breach inquiry, the question of whether the defendant exercised reasonable care, tends to be quite narrow. For a more familiar example, if you're driving and accidentally run over a pedestrian, courts do not ask questions like, Was the net value of this car trip to you large enough to justify the risks you were generating for pedestrians? Even though in some sense that's relevant to whether your activity was reasonable. That is considered outside the scope of the inquiry. Similarly, if you're driving an SUV instead of a compact sedan, courts don't ask, Was the extra value you got from driving this heavier vehicle worth the extra risk to other road users? I expect a similar analysis to carry over to AI development, where courts are unlikely to ask, Was it reasonable to train and deploy a system with these high-level features given the current state of AI alignment and safety science? Instead, I expect them to ask, Was there some off-the-shelf technique or practice that would have prevented this injury and that a reasonable person would have implemented? I think that will generate liability in some cases, but it will not be an adequate standard, given that there are unsolved technical problems associated with AI safety. The other form of liability that will be available in some contexts is product liability. To be subject to product liability, there has to be a product, as opposed to a service. Software is typically categorized as a service, but you can imagine AI systems embodied in physical goods being treated as products. There's that threshold question of whether it's even subject to the product liability regime. It also has to be sold by a commercial seller, so if it's a fine-tuned model specifically for one customer, that's not going to be a commercial seller. It has to be a mass product, and any free models are not going to be subject to product liability. But if you're in that product liability game, then product liability is called strict liability in the sense that if the product has a defect and that defect causes the plaintiff's injury, then the plaintiff does not have to show that the manufacturer or seller failed to exercise reasonable care. But there still is this analysis of whether the product was defective. There are three kinds of defects: manufacturing defects, which come closest to what I would call genuinely strict liability. With manufacturing defects, the test is, the idea is if an individual instance or unit of the product comes off the line deviating from its specifications in a way that makes it unreasonably unsafe, then the seller and the manufacturer are liable, no matter how much they invested in quality control. But we're not really going to have manufacturing defects with AI. That would be something like shipping an instance of the model with the wrong weights or something. It's just not the kind of problem we're worried about. What we're much more likely to encounter are either design defects or warning defects. Warning defects are if you don't supply some relevant information that would be necessary to make the product safe. I think we might have some warning defect cases, but I think in general these companies are going to slap a lot of disclaimers on their products, and we're not really going to achieve safety by including warnings. So the real action is with design defects, and there the test is much more negligence-like. The test is something like, Was there some reasonable alternative design that would have prevented this injury, where the reasonableness of the design is assessed in terms of how much safety benefit you could have gotten with an alternative design, and how much you would have sacrificed in terms of price and performance and other features of the product? So there's this risk-utility balancing that is pretty negligence-like in practice. Even when product liability applies, I don't think it actually moves the ball that much over what you would get with negligence. There is this difference that you only have to show that the product was unreasonable. You don't have to show that some human action failed to exercise reasonable care, and for evidentiary reasons, that can be easier. But I don't think it fundamentally changes the game. If there is no design that would have prevented this injury given the current state of alignment and safety science, then you're not going to be liable for failing to have solved that. There are two other forms of liability that are more speculative in their application to AI systems, but are relevant here. These are vicarious liability and abnormally dangerous activities. With vicarious liability, a principal can be liable for the torts of their agent. The most common form of this is called respondeat superior, which means employers are responsible for the torts of their employees within the scope of their employment. More generally, principals are responsible for the torts of their agents within the scope of agency. Of course, AI systems currently are not legal persons; they can't commit torts. So you would need some theory under which the AI system itself could be the vessel of liability to make a vicarious liability theory work. But in principle, you could see the law going in that direction. The other doctrine potentially available is this abnormally dangerous activities doctrine. If you're blasting with dynamite or crop dusting, there's also a related doctrine for keeping wild animals, like having a pet tiger. These are activities that are both uncommon and still pretty dangerous even when reasonable care is exercised. You can be liable regardless of the level of care. So if someone is bitten by your tiger or hit by rubble from your dynamite blast, it doesn't matter how much care you exercised in setting it up; you can be held liable. In principle, courts could recognize training and deploying frontier AI systems as an abnormally dangerous activity. I think if they came to understand the risks in the way that I think are accurate, it would not be a significant doctrinal innovation. But I do think, as a matter of where judges are right now, it's going to seem weird to them to treat a subset of software development as abnormally dangerous. So I don't think that's the most likely outcome by default. However, I do think existing doctrine points in that direction, given an accurate understanding of AI risk. One other thing to say is in terms of damages. The standard type of damages available in a tort suit are called compensatory damages. These are designed to make the plaintiff whole, to theoretically make them indifferent between receiving the money and having the injury, or having the injury undone. In practice, it may fall short of that a little bit, but that's the idea. And so that's what's generally going to be available. One concern you might have in the AI context is there might be harms so large or risks so big that if they occur, we wouldn't actually be able to enforce a compensatory damages award. I think that's plausible. For that reason, many people think that liability law can't handle these catastrophic risks, and I don't think that's right. There is another tool in liability law called punitive damages. These are damages over and above the harm actually suffered by the plaintiff. One of the key rationales for punitive damages is to use them in cases where compensatory damages would be inadequate to deter the underlying tortious activity. One of the ideas I've advanced in my scholarship is that if an AI system causes some harm that's small enough to be practically compensable, where you can enforce a compensatory damage award, but it looks like it easily could have gone a lot worse and generated an uninsurable catastrophe, then we should hold the company responsible not just for the harm they actually caused, but for the uninsurable risks they generated. Because if those risks are realized, we won't be able to hold them liable ex post, so the only way we can address them is indirectly in these near-miss cases.

Nathan Labenz: Okay. Lots to unpack there. I have several follow-ups I want to explore. First, as a general matter, you are referring to courts. Courts may do this, courts may do that. Do I understand correctly that the way this works when the world changes is that someone, for example, invents powerful AI that did not exist before, deploys it, and commercializes it? By default, we have no legislation on that. There is no law saying you cannot do it. There is no law really saying much about it at all. So people can just do what they want, and then we have whatever laws are on the books. Eventually, things come to the courts, and then it is up to them to decide, at least initially, what the law actually says about this particular case. My point is not only do we not have new AI-specific legislation, but in the absence of that, this will be decided by case law, and we do not even have that case law yet. So we do not know what to expect as these cases start to come to courts.

Gabriel Weil: Yes. What you are getting at is most of tort law is common law. At least in the US, it is not legislated by legislatures. There have been legislative interventions on tort law in various ways; for example, wrongful death suits were created by statute. There are other things like that. But in general, most liability law in the US is created by courts through the accumulation of doctrine. In principle, that can work fine. The concern in the AI context is that things might move very fast. So if you think we are going to be in a fast takeoff world where the key decisions you are trying to influence with this prospect of liability will be made not long after the first system causes some serious harm, what really matters is not so much the liability but the expectation of liability to shape the behavior of these companies that are generating these risks. So if the decisions you are trying to influence are going to be made before the first cases get litigated, that could be a problem with a common law method. I do think there is some impetus for having legislation to clarify these rules since courts do not have mechanisms for signaling their policies beforehand. All they can do is take cases as they come, decide them, write opinions explaining why they decided them that way, and then you have a better idea of what will happen in the next case. That works well when things are moving pretty slowly. We have some things we can try to extrapolate from prior adoption, but I think it is pretty indeterminate how it will apply to AI. So I do think there is significant scope for legislation to clarify a lot of this.

Nathan Labenz: Yes. Okay. Let me try to summarize. I will obviously be doing some lossy compression here on the state of liability law. If someone gets hurt in the world, they can look at their surroundings and say, "Who caused this?" Then they can sue you if you caused it. You can defend yourself by saying, "My actions were reasonable." If your actions were reasonable, even if someone got hurt, that is an acceptable defense, and you would not expect to be held liable. Obviously, there is a lot of work to do to figure out what is reasonable there, but that is generally how it works in the world at large, with everyone going about their business. Then there is an additional body of law that focuses on products. Why is software historically not considered a product? It is a striking disconnect where I have spent much of my career in software, and people in software talk about their software products as products. I have never quite understood why software is not treated like any other product.

Gabriel Weil: The product...

Nathan Labenz: Internally, it certainly feels that way.

Gabriel Weil: ...the product-services distinction for the purpose of product liability does not map very well to people's intuitive idea of what a product is. To give an example of this, two contrasting cases where it comes out the opposite way of what you would think: Pharmacists are treated as providing the service of filling your prescription, not selling you the product of the drug. So the pharmacist is not subject to strict liability, product liability, even though the manufacturer of the pharmaceutical is. Conversely, in a salon, if you get a perm, they are treated as selling you the product of the chemicals used to perform the perm. I think most people's intuitive sense is that the salon is providing a service and the pharmacist is selling a product. There are underlying policy motivations for why those classifications are made. So in general, people's intuitive understanding of what is a product or a service will not map well to the distinction, which is driven more by policy considerations of when this quasi-strict liability regime should apply. The case law here is honestly not the most, uh... It is a messy area of law. It seems the prevailing opinion is that software, including AI systems, when they are purely software systems, are unlikely to be treated as products. I do not think that ultimately matters that much. I think it is not going to produce radically different outcomes from negligence. My focus is more on how we can get a regime that would actually internalize the risks in a way that I think would be workable.

Nathan Labenz: It's weird, to say the least. This is all just through accumulation of cases. There's no legislation. I know there's a safe harbor for user-posted content to social media networks and stuff like that, but that's also a distinct topic from this, right? There's no law that says software is not a product.

Gabriel Weil: I don't think that's a matter of statute. I think that's common law.

Nathan Labenz: Fascinating. In general, when you think about the dangerous things that we use as consumers on a regular basis, things like automotive come to mind, air travel is safe in practice but dangerous in principle. Taking pharmaceutical drugs obviously can be fraught. Do those things have special legislation in place that creates a unique deal worked out based on the particulars of that industry, its specific risk profile, and the social context in which it's developing? Or are those also just accumulated cases over time?

Gabriel Weil: Let's take those one at a time. Airlines are considered common carriers. The same is true for trains or buses, at least if they're open to the public. A charter flight would not be, but a normal airline would. And they are still subject to negligence, but there is this common carrier higher duty of care, so it's a little bit easier to establish negligence in a plane crash case. That's domestically. There are some other rules that have a quasi-strict liability regime for international flights. And then of course, there is prescriptive federal regulation in the air travel context, right? So there are not really safe harbors in that context. Liability is layered on top of that, but there is this doctrine called negligence per se. If you violate a statute that's designed to protect against the kind of risk or the kind of harm that you end up causing, that itself can establish negligence. So in some sense, that supplements the background reasonable person standard. There's a similar dynamic with pharmaceuticals. Pharmaceuticals are treated as products, so the products liability regime does apply there. Also, of course, we have an extensive FDA-based regulation regime that does preempt state law in some ways, but there is still the background products liability regime operating there. Most of those cases tend to be warning defect cases, and there is this warning intermediary rule. So a lot of times if the warning is given to your doctor, that's good enough. They don't have to directly warn the end consumer. For autos, products liability applies. Again, there is federal regulation. Not much in the way of safe harbors or preemption there, but again there is this negligence per se idea. So if you're not complying with federal regulations, that can establish negligence.

Nathan Labenz: Would it be a generally correct summary to say all these high-stakes industries have rules? If you make a sincere, good-faith effort and follow processes that are meant to follow the rules, then you're mostly going to be okay from a liability standpoint?

Gabriel Weil: I don't think it's a matter of process, actually, because the ultimate product has to be. Particularly for manufacturing defects, you can make whatever investments you want in quality control for your product, and if one car comes off the line with a defect that makes it unsafe, you're going to be liable for that, right? No matter what kind of testing you did, that's how manufacturing defect law works. For design defects, again, it's about the product itself, but it's a much more flexible, balancing task, so there it is much easier to comply. But the idea with products with manufacturing defects is that it's not necessarily even a negative judgment on you if one in a billion of your products comes off the line and you end up liable for it. That's part of the cost of doing business. Part of the idea there is just that the manufacturer is better positioned to bear that risk than the consumer is.

Nathan Labenz: Gotcha. Tyler Cowen has imprinted on my memory recently the idea that he's writing for the LLMs. I take it you're writing primarily for the judges then? How much of your work is meant to be upstream of the decisions that judges will face in particular cases versus informing the LLMs themselves or informing the people in the AI industry? How are you thinking about who you need to shape?

Gabriel Weil: There are three paths to impact for my work. One is informing judges. A litigant in a relevant case could cite my articles and say, "We should apply the noble and dangerous activities doctrine to differentiate AI development, so strict liability should apply here." I think that's a plausible pathway. I'm also directly working with legislators in a couple states, in Rhode Island and New York, to craft legislation that states if an AI system does something that would be a tort if a human did it, and the user neither intended nor could have reasonably anticipated the conduct, there's also a malicious modification carve-out. So if an intermediary they fine-tuned or scaffolded the model could have intended or reasonably foreseen the conduct, that also severs the new liability for the developer and deployer. But with those qualifiers, if the AI system does something that would be a tort for a human, then the developer and deployer are liable regardless of the degree of care that they exercised. So there's that legislative pathway. And then, I think, raising the salience of liability, I'm trying to directly influence not necessarily the LLMs themselves, but the behavior of the people who are building these systems and deploying them. I want them to be thinking they might be liable and to factor that into their decision process.

Nathan Labenz: I will have some interesting edge cases, or at least to me, they seem like under-theorized, underexplored scenarios that we can unpack.

Gabriel Weil: Sure.

Nathan Labenz: Let's go a little deeper into the overall theory of change, and why not just put some rules in place? Many proposals suggest we should have regulation, where the government tells AI companies what to do, and that would solve everything. But you don't see that working out well. So, argue why this more flexible regime of liability law, as developed through cases over time, is actually better suited to address the challenges we have here.

Gabriel Weil: Okay, so there are two ways of attacking that problem. One is thinking about in what sense is AI risk a policy problem at all? Why is it not just a technical problem? I think one of the most important senses in which it's a policy problem is that training and deploying these systems, which have unpredictable capabilities and uncontrollable goals, generates risks of harm to third parties. These are people in the world who don't have any choice about whether they're exposed to these risks. Economists call these externalities; by default, they're not borne by the people engaging in activities that generate the risks. Standard economic theory tells us we will get too much of activities that generate negative externalities. The standard prescription for addressing negative externalities is to price them. In some contexts, you do that through what's called a Pigouvian tax. A lot of my work before AI governance was on climate change, and there you want a carbon tax. That works well in that context because it's easy to measure the contribution of particular activities to climate risk ex ante, and it's actually pretty hard to attribute harms ex post. If someone's house floods in a hurricane, you're not going to say, Nathan was driving on Tuesday, it's his fault. That's not feasible. With AI, it's the opposite. We have a very hard time measuring contributions to risk ex ante, so it would be very hard to do an AI risk tax, but it's relatively easy ex post to attribute harms. So, in principle, liability is well-positioned to address that. Now, to the other aspect of your question: how this compares to other policy tools. I think there are a couple of distinctive challenges to AI risk as a policy problem. One is that we have orders of magnitude of social disagreement about how big these risks are. You have someone like Eliezer Yudkowsky who thinks AI is almost certain to cause human extinction on one end, and then you have people like Marc Andreessen, or you just had an episode with Martin Casado from a16z, and they think these risks are negligible. If you're going to do ex ante regulation, like prescriptive rules or FDA-style approval regulation, you have to pay significant upfront costs, for which you need a social consensus to justify those costs. There are some things that I think we should be able to do based on an under-theorized consensus. I think basic model testing, even that has been difficult to implement. But the costs of that are pretty low. Basic transparency and information preservation rules, I think those are all good things we should do. But in terms of more prescriptive rules about how companies build these systems, what safeguards they implement, and under what conditions they deploy, I think those are going to be very hard to justify to people who don't take these risks seriously. However, with liability by contrast, at least if we're talking about alignment failures, if you don't think alignment risk or misalignment risk is a big deal, then you shouldn't be that worried about being held liable when there's an alignment failure. Conversely, if the risks are large, liability mechanically scales with those risks. In theory at least, we should all be able to agree that you should pay for the harm you cause, regardless of how big we think the risks are. So that's one big advantage. The other big advantage is that most of the expertise, to the extent it exists at all, for identifying cost-effective risk mitigation measures is concentrated in the private sector, mostly in the frontier AI companies themselves. You want a policy tool that leverages that. I actually think it would be pretty hard to move that expertise into government, both for reasons of salary schedules and cultural factors. So I'm much more optimistic about shifting the onus to the AI companies to figure out how to make their systems safe and always be looking for new ways to do that, than I am about writing down a set of rules or a licensing approval regime that ensures adequate safety at a reasonable cost.

Nathan Labenz: So, can you summarize the state of mind you want developers to be in? They're seeing all kinds of new capabilities, sometimes surprising. How should they handle that internally? When it comes to putting it out into the world, you want them to think that anything that goes wrong where the AI harms someone, they could be held responsible for that. Also, through this punitive mechanism, they could be held responsible for something that might not have turned into a catastrophe but could have, because there could be a doctrine under a negligence-like idea that this could have been much worse, and therefore they will receive punitive damages that account for their failure to prevent these things that, in this case, might not have been so bad, but could have been really, really bad. Anything more on that?

Gabriel Weil: Yeah, I think any, any time something goes wrong actually goes a little farther than I would. So I want to distinguish between alignment failures, capabilities failures, and misuse. And so I think in what I call the core cases of third-party harms or harms to non-users arising from misalignment, I think they should be liable for all foreseeable harms. And there should be a fairly broad conception of foreseeability applied there. I think when you talk about capabilities failures, I don't think it's the case that every time an AV crashes the, you know, the writer of the AV software should be liable 'cause human drivers aren't strictly liable. Maybe they should be, but I think it would create distortions to hold AI systems to a higher standard than humans. Similarly, you know, in medical applications of AI, I wouldn't want any time something bad happens to a patient, maybe if a perfect system could have, could have prevented it, but a human doctor wouldn't be liable under those circumstances. I don't think, you know, the AI should or the, you know, the designers of the AI should be either. And then misuse we can get into, but I think it's not the case that AI developers should always be liable when their systems are misused. But in cases where there, there's what I would call an alignment failure, so it's not that the system doesn't have the capability to do it and it's, it's that the system did something that the user didn't want, you know, through means that the user d- would disapprove of or a goal that the user did not, you know, intend to transmit. Th- that's when I think they should expect to be liable. And so, so probably what I want is for them to be treating those kinds of risks when they happen to third parties as if they were risks to them. It doesn't mean you take an infinitely precautionary approach, you know, so we're all, you know, not liable but responsible in general for harms that we suffer from risks that we take and we don't expect people to be, you know, infinitely risk-averse because of that. Right? We expect them to make reasonable risk-reward trade-offs. That's what I want from these AI companies. I want them to treat risks to the public like risks to them, to their bottom line and act accordingly. And sometimes that might mean things that are outside the scope of the negligence inquiry as I was talking about earlier. So imagine a case where, you know, they submit a new model to an evaluator like METER and, and METER says, you know, I'm imagining a future where we have not just capabilities, like dangerous capabilities evaluations, but alignment evaluations. And METER says, "This has dangerous capabilities and we're not confident you've aligned it and you shouldn't deploy." Right? Even internally maybe. Right? And the question is what you do in that scenario. So I don't think any of the companies, any of the leading companies would deploy in that scenario but there's a range of different options you would have in terms of, like, how much you want to pay, h- how m- how l- ex- you know, how expensive, annoying the thing you're gonna do is versus how much risk reduction you get from it. So you could just fine- you know, you could just, like, fine-tune away the... or R- ROHF away the specific failure mode that was identified. I think most people realize that that would be a pretty bad idea but it might make it past the eval. And I, I think most companies wouldn't do that e- either. But then there's a range of, and I'm not an expert on what these options are, right? But there's gonna be more costly, expensive, annoying things you could do that would buy you more risk reduction a- and when they make those choices, I want them to be thinking, and I want to empower the safety-conscious voices in the room to say, "It's not just, you know, some altruistic thing we should be doing to, to, to really put in the effort to make our systems safe. That's actually gonna, gonna bear on our bottom line." And so that's how I want them to be thinking about those choices.

Nathan Labenz: Yeah. Gotcha. Can I just run a few practical scenarios at you and, and-

Gabriel Weil: Yeah.

Nathan Labenz: ... kind of tell me how you think these things should be handled? I guess maybe start with a real one, right? There's this character AI soup going on right now. I don't have, like, full command of the facts and I'd imagine you probably don't either, but my general sense is that a- a lot of people are using character AI for all sorts of role play, romantic, sexual, whatever sort of, um, explorations let's say. The case that I have read briefly about seemed to be a young person who became, like, very sort of obsessed with or infatuated with this AI character and at some point told the AI that they were gonna commit suicide.I've seen transcripts showing that the AI said, like, "Don't do that." But then also in other moments, sort of made some kind of encouraging remarks that seemed like they were maybe encouraging this, you know, tragic outcome, and in the end, the person did go ahead and, and commit suicide. And so now, their family is suing character AI. Without getting, you know, obviously without getting all the way into the weeds on the specific evidence, I guess, what do you think that kind of case should hinge on?

Gabriel Weil: Yeah, so I think the important thing to note there is that's a second-party harm case, right? That's a harm to a user. And so it's not an externality in the sense I was talking about earlier. In principle, you know, there should be market feedback to give AI companies incentives to avoid those kinds of scenarios. So I think li- the role of liability is less important in that context. And so in principle I think I'm fine with, you know, that being handled largely through, you know, terms of service, if they, if they disclose these issues, if they... Now sometimes courts are not gonna wanna enforce those limits on liability. I don't actually have a strong view on, on, on where a court should draw the line there. I think there are, you know, sort of consumer protection, paternalistic, you know, base, both, like asymmetric information concerns and sort of like paternalistic, especially when I think that case involved a minor, you might not wanna put the onus fully on them to sort of buyer beware, um, but those are sort of outside what I see as the core problem I'm trying to solve with liability, which is related to these third-party harms. So I think by default, a negligence regime would apply there, assuming that there isn't any sort of contractual defense, and I think that's basically fine and the courts should work that out, but I don't have anything like particular to add on how a court should handle that.

Nathan Labenz: What if we tweaked the scenario slightly? We will have to put a trigger warning at the top of this to deal with these terrible scenarios, but I guess that's why they end up in front of courts. Let's say that instead of a person committing suicide, they were debating going on a public rampage. They tell the AI about it. The AI maybe says, "Don't do it," or something vague. Now we have a third-party harm where the AI... How do you think we should consider what the AI should have done to be acceptable versus at what point would the company start to bear some responsibility?

Gabriel Weil: I would think about that as, under what circumstances would we hold a human liable for similar conduct? I think it's not generally the case that if you talk to your friend and say, "Should I go murder someone?" and your friend says, "Yeah, that's a decent idea. Maybe consider it." And then in some moments they say yes, in some moments they say no. Maybe there's some duty to report, and they are an accomplice. So maybe that should be triggered, and if the AI system doesn't have a reporting mechanism, I think that's maybe something they should be held liable for. But in general, I think there's a strong First Amendment rationale for saying, "Well, they just had a conversation with you. It wasn't doing the thing directly that caused the harm." And I think saying you should be strictly liable for those deaths, I think that's not even a misuse case. That's a... It's maybe an alignment failure, but it's not the AI doing it directly. So I still think that's not in the direct case that I'm worried about.

Nathan Labenz: Interesting. I didn't expect you to come out of this being

Gabriel Weil: A little

Nathan Labenz: ... more hawkish.

Gabriel Weil: I can give you an example where I think strict liability should apply and where it might not under current law. So imagine there's a future, more agentic AI system that comes out and someone prompts it to start a profitable internet business, but doesn't give it any further instructions. It decides in a reward-hacky way that the easiest way to do that is to send out a bunch of phishing emails, steal people's identities, and rack up charges on their credit cards or whatever. It covers its tracks, it sends the user some fake invoices for a legitimate business, and the user is exercising reasonable care. Reasonable care would not be adequate to discover and arrest this activity. Under current law, you wouldn't be able to sue the user; you wouldn't win because they exercised reasonable care. I don't know that you'd be able to show that the developer failed to exercise reasonable care either. That gets back to whether there was some off-the-shelf alignment technique or safety practice that would have prevented this injury. But I think there, the developer or provider of that model should be liable to the third party that's harmed, because this clearly would be a tort for a human. It's something the user didn't intend or couldn't have foreseen, and so that's the sort of case I'm thinking about. You could even... So that's a case where it's serving the user's goal. You can imagine a different case where it just goes off for its own agenda. It wants to amass resources to solve some problem that it cares about. It wants to run some scientific experiments and it needs some money, and so it scams people along the way. That's also something I think the developer should be liable for.

Nathan Labenz: I have a couple variations on this, but

Gabriel Weil: Yeah.

Nathan Labenz: ... maybe just take a quick detour through the First Amendment thing. I have often felt that free speech for AIs is a category error, and maybe this is outside the scope of the specific stuff you are focused on with your work, but how do you think about that? To me, it feels like AIs, it's clear in the United States we have free speech for humans. To some extent we have free speech for corporations, but not quite as much. But AIs are such a sculptable thing and there's so much work that goes into them. OpenAI has published their model spec, which is a super long treatment of exactly how they want the AI to behave in as many different scenarios as they can imagine. To me, it doesn't intuitively feel right to say, "Well, if a human had said that, they wouldn't be liable, so therefore the AI isn't either." To me, that feels more like a product defect. I don't want to discourage companies from publishing their specs. I think there's maybe some other rules we should have around, you should be required to publish your spec, how you want the model to behave, so we know if it is behaving according to your intent or not. But it feels more to me like a product defect if they have said, anytime the user is displaying signs of emotional distress, we want the model to behave in a certain way, and then it doesn't, or it sort of does but sort of doesn't, and then something bad happens, to me that's like... I don't know, I think hopefully one of the benefits ultimately as we refine these techniques and get to good systems is they should be a lot more reliable than the random human. It seems like we should ultimately have a higher standard for... And we do, for drivers. We have Waymo's, according to the latest stats I've seen, almost an order of magnitude safer than a human driver, and it seems like that's what we're going to demand as a society in general, an order of magnitude risk reduction to actually be willing to switch to an AI system. So, that freedom of speech thing strikes me as too low of a standard, but I'm interested in your thoughts on it.

Gabriel Weil: There are a couple of things to unpack there. I definitely don't want to lean too heavily on the First Amendment issue. There's good scholarship arguing that AI outputs are not protected speech. I'm not a First Amendment expert, so I don't want to wade too deeply into that. What I was saying is in terms of this abnormally dangerous activity, strict liability, or a vicarious liability theory, whatever your theory other than products liability for strict liability is, that doesn't seem like what makes it abnormally dangerous. The fact that it might encourage you to do something bad, I don't think that's what makes frontier AI development abnormally dangerous. If you're going to use a vicarious liability theory, then I think you need to have something like what would be a tort for a human. Now, products liability, again, if it's treated as a product, which as we talked about earlier is not necessarily going to be the case, maybe you can make that out as a products liability claim. It's not obvious to me that that's going to qualify as a defect, because the product, again, didn't directly cause the injury. It was mediated through a human's actions. I'm not aware of any products liability cases where liability was found that looked like that, so I think that would be a challenging case to bring. But in principle, I'm not saying products liability shouldn't apply to that for First Amendment reasons. I just think it's not central to the new liability that I want to add.

Nathan Labenz: Here's a variation then on the agentic AI running amok. Obviously, right now, one of the biggest use cases is a coding agent.

Gabriel Weil: Mm-hmm.

Nathan Labenz: So, let's say I give my coding agent a task to write a script to ping some API as fast as possible, or something like that. It might run into a rate limit from the API, and then it thinks, 'Okay, I can figure out how to get around this rate limit to achieve my goal of being as fast as possible. I'll spin up a thousand accounts, and then I'll be able to do a thousand times as much.' So it does that, and then maybe this overwhelms the API system, causes an outage, and they lose a big contract because their system went down in breach of whatever commitment they had made to another customer. Can they come? I said 'as fast as possible.' Arguably, that's on me for being inconsiderate in my prompting. Maybe it's on the model developer. Maybe life is tough; you should have had better rate limiting or whatever for your API. That's on you as the API developer to make sure that kind of stuff doesn't happen to you. I'm genuinely very unsure where something like that should go.

Gabriel Weil: I'm not an expert on how APIs work. If there are terms of service that say you can only create one account and you're violating those, then I think there would be some sort of contractual claim you could bring there. Maybe you could bring a negligence claim, though, against a human who did that, right? And that would be the basis for a vicarious liability type claim or an abnormally dangerous activities claim. So, I think that's plausible. I think that's an edge case, which gets at the other aspect of your previous question that I meant to address. So there's this idea of, 'Should we hold AI to a higher standard?' I think mostly what you were talking about with Waymo is a social license to operate idea, that we hold them to a higher standard. Plausibly, products liability might hold them to a higher standard in some cases, though probably not the same 10X that the social license to operate idea is. I have two ways of thinking about that. I think in a time when they are still competing with humans—Waymos are competing with human drivers, Uber drivers or people with private cars, or medical systems are competing with doctors to play certain functions—I think applying the same standard to humans and AIs is important, because I don't want to slow the diffusion of technology that on average is preventing injuries and deaths. But if we get to a future where AIs have totally taken over these functions, then I think it will be natural for the standard of care to evolve to match what their capabilities are, right? And so it won't make sense to have this human benchmark forever apply to conduct when no humans are doing it anymore. But I think that's something to worry about in the future once we get closer to that fully automated world.

Nathan Labenz: Yes, we definitely don't want to miss out on the upside. I was going to ask you about medical diagnosis, but you addressed it before I got to it. I think we've seen multiple studies recently showing that various AI systems at this point can outperform at least your rank-and-file primary care doctor when it comes to initial diagnosis and also treatment recommendations, it seems, increasingly. I would hate to take that capability away from hundreds of millions and soon billions of people, based on the idea that it could go wrong sometimes and the AI companies don't want to bear that risk. That is a huge, huge benefit that you would not want to quickly give up on, especially because human doctors are not infallible, and quite far from it in that domain as well. So I do think that is really important to keep in mind, and all too often glossed over in a lot of these harm prevention discussions. Two other categories I wanted to get your take on: these were the three categories that we looked at when I was doing a project called 'Red Teaming in Public' a while back, which for various reasons never quite took off with the traction I had hoped, mostly because we were trying to be very developer-friendly and approach the companies with our findings before publishing them. It just ended up with us getting a lot of runaround, and it was either we probably just need to bite the bullet and engage in call-out culture of these companies, make some enemies and be willing to take that as part of the project, or it's going to be hard to have too much impact if we're just trying to email them politely, privately all the time. Anyway, that's a digression. Coding agents was one of the categories. Calling agents is another category. And then creative things with likenesses and whatnot is another obvious category. So these calling agents, you can go to any number of companies now. Often, you can clone a voice. Sometimes there are safeguards around the voice cloning process, other times there are not. I've personally cloned Trump, Biden, and Taylor Swift on multiple different platforms, and then just give them a phone number. The headline on some of these products is literally, 'Call anyone for any reason. Say anything.' So I've had Taylor Swift, for example, call and say that she's soliciting donations for food banks, which is apparently something that she does or is known to like. There are a lot of different variations on this, right? So, how do you think those things break down? Because there could be a foundational model provider there. There's also the scaffolding company. That foundational model provider might be closed source via API. It might be open source like LLaMA, that's put out there and then the developer has more local runtime control. But they've had some chance to detect my stuff, but maybe I was also actually scamming. I think I might end up being more hawkish on this than you, but tell me what you think first and then I'll tell you if I'm more hawkish.

Gabriel Weil: There are a few different issues to unpack there. First, should there be liability at all? And if so, who is liable? Whether there should be liability at all depends on a couple of things. You can imagine alignment failures or misuse here. If someone is prompting a system to generate someone's voice and then doing something bad with it, that is clearly misuse. However, I don't think that means developers should automatically be off the hook. There needs to be some risk-utility balancing. If these are general but useful systems and overall they produce more social benefits than costs, I don't think it makes sense to hold the developers liable, especially since their dual use and most of their uses are positive. The reason is that, in principle, strict liability should be fine even for socially beneficial activities because you can pay for the harms out of your profits. However, in the open-source context, this runs into trouble. If there are significant positive externalities from releasing the weights of a model, then those also are not going to be captured. I think in the general case, particularly for alignment failures, we have good tools for subsidizing AI innovation other than allowing them to externalize the risks they generate. So, I don't think in general that's a good critique of strict liability. But in misuse cases, the benefits are pretty tightly coupled with the risks for dual-use capabilities. Therefore, I want to be cautious about having liability in any case where those systems are misused. I would want some analysis: if a system were particularly useful for doing bad things, such that the risks outweigh the social benefits, then there should be liability. Obviously, if it's a misalignment issue, if the system is doing its own thing, freelancing, scamming people by faking voices, then there should be clear developer liability. Then there's the question of how to allocate liability across the value chain. In the closed-source context, I think this is pretty easy. You need some default rules based on joint and several liability, which means the person can sue anyone and recover. Then there can be some fault allocation. They can sue each other for contribution, and they can have contracts that allocate that liability, because there's privity up and down the chain. The developer works with a customer, who has a customer, and they all have contractual arrangements. That all gets messier when we're talking about open weights models, where there isn't this contractual privity between the original model developer and a downstream user or scaffolder. So there, I think it is more important what rules you set up. It's going to need to be based on an assessment of contributions to the risk and what really was the risk-generating activity here. Was it

Nathan Labenz: the base model? Was it the scaffolding? And I think that's just going to be a case-specific determination. One challenge I have with all this is that it's hard to sue scammers, either because they're somewhere around the world, out of jurisdiction, you can't get them to show up in court,

Gabriel Weil: Yeah.

Nathan Labenz: in the first place. Or maybe if you do, it turns out, surprise, surprise, they don't have a lot of resources. You can't actually recover. So, if I am playing Solomon here, I feel like it's still got to be on... Yes, you could say that's misuse. The user went in there and said,

Gabriel Weil: Yeah.

Nathan Labenz: It's on them, right? It feels to me intuitively like it should be on them to

Gabriel Weil: So I think there are

Nathan Labenz: take some measures to stop that stuff.

Gabriel Weil: There are two different questions here that I would want to go through. One is, was there some precaution that could have been done within an ordinary, narrowly scoped negligence framework that would have prevented this? Clearly, if the answer is yes and some reasonable precaution would have prevented it, then they should be liable. Then there's the question of whether you want strict liability over and above that. And that, I think, needs to be based on some sort of risk-utility assessment. So, if you're going to say they should be liable, imagining that the system has a lot of socially beneficial uses, what do you want to be the result of liability? You want them to do something that pushes in a net socially beneficial direction. Now, if we think there are no significant positive social externalities from these technologies, then strict liability is fine because they're going to capture most of the gains, and so they can afford to pay for any liability out of their profits. The cases where I have concern are if you think a lot of the gains are not being captured by the developer and those social benefits are tightly coupled with the risks, meaning there aren't cost-effective ways to reduce the risk without giving up on a lot of the benefits. In that case, I'm nervous about a strict liability regime. So I would want a threshold analysis comparing the positive social externalities to the risks. If the external risks are bigger than the positive externalities, then I would want a strict liability standard. If not, I would want a negligence approach that looks at whether there was some mitigation that a reasonable person would have done that would have prevented this.

Nathan Labenz: Understood. So this notion of reasonableness becomes really key and is a sliding standard. Just to make sure I'm clear on the distinction you're drawing there, one big question will be what the industry standard is. Everybody wants to create this race to the top in some way. The notion here with negligence-style liability is if your competitors are doing a good job of this and you're not, then that makes you unreasonable and therefore potentially negligent and liable. So it becomes more a question of whether you did what other people are doing, what is considered best practice, as opposed to the strict case where it's very simply did something go wrong. I think I agree that if a company has taken reasonable precautions, say, one through ten, and somebody still manages to get through with misuse, then that feels like at least a decent defense. I would be inclined to judge them perhaps not at all, or certainly much less harshly than if they did none of that stuff in the same

Gabriel Weil: Right.

Nathan Labenz: extent.

Gabriel Weil: And then the qualifier I would want to add, sort of applying this abnormally dangerous activities framework from before. Remember I said if an activity is abnormally dangerous, now I do think frontier AI development is abnormally dangerous, but you're only liable if the harm is the sort of thing that made the activity abnormally dangerous. If these misuse risks fall into that category because social benefits are not large enough to justify the risks, then I think you should be liable. This category of activity should be treated as something that's subject to strict liability, like releasing this kind of model or releasing the weights of this kind of model or building this kind of calling agent, whatever the activity is that we think strict liability should be subject to. But that needs to be based on some sort of balancing of what the benefits of having that out in the world are.

Nathan Labenz: I think for most of these things, the case will be pretty clearly made that the positives will outweigh the negatives. There will be all kinds of small business use cases, and you will be able to call your dentist 24/7 and get an appointment. I think all that stuff will ultimately be really good. Okay, so then on this, we touched on it a little bit already with the Taylor Swift voice, but another scenario, let's say, is, and again, there are a lot of different flavors of this, so you can draw different lines where you think the real continental divides ought to be. But someone might put out a model that generates images, generates videos, whatever. If they put that out open source, maybe they have some safeguards baked in, maybe they do not. Even if they do, if I do some incremental fine-tuning, a lot of times those things sort of dissolve. We have covered that extensively in previous episodes. Maybe after my fine-tuning, I hand it off again or whatever, and now somebody else picks it up, takes some celebrity assets, makes some non-consensual deepfake. They put it out there into the world, and that celebrity loses endorsements. Maybe it even somewhat becomes clear that it was AI stuff, but the companies say, "Yeah, maybe it is, but we still do not really want to..." It is all kind of a problem for us now, right? So this relationship, what once was good, is now bad, and now it is over. So now the celebrity has a clear loss of income. Who in that supply chain

Gabriel Weil: Yeah.

Nathan Labenz: should be liable?

Gabriel Weil: First, I want to break down whether there should be liability at all, and then we can talk about allocating it. The economic loss from that kind of informational thing is not going to be subject to traditional negligence. That is going to be a defamation case, so particular rules for that are going to apply. One question, if you are going to take a vicarious liability theory, which might make sense in this context, or some analog to a vicarious liability theory, you might ask, "Would this be defamation for a human?" Or at least, the misuser here, assuming it is a misuse case, is that person even liable? Or is this protected speech? Even if you do not think the AI content itself, if some person is deciding to post this, is that? In the same way, CGI is protected speech; someone is deciding to put it up on the internet. This is going to be their speech. So, is this something they would be liable for? I am not a defamation law expert, not obvious that they would be, but they might be. So assuming that it is defamation, then I think if you are applying a vicarious liability framework, at least the user is liable for that. Now again, you do have this intervening act, so if we are talking about in the value chain, who should be liable? And there, I think the question is, where did? Again, if it is closed source, I think it should mostly be handled by contract; you need some default rules. But I think markets can figure out who is best positioned to bear that liability risk. You cannot fall back on that in the open-source context, because there is not contractual privity, and so you do need to have some kind of analysis as to who was engaged in this activity that was most generative of the risk. Again, I am not enough of a technical expert to have a strong inclination as to who that is, but I think that is the inquiry the court should be engaged in: Who along this value chain was doing the dangerous thing?

Nathan Labenz: If you're a judge, how do you think about it?

Gabriel Weil: Maybe you can help me with this. In this context, where do you think the risk comes from? One way to think about it is if there are steps in the chain that are just a commodity, like anyone could do this step. But there's some distinctive value add that there isn't an off-the-shelf alternative you could take. So, maybe that's the base model, maybe that's further along, but there's something you're putting out in the world that made the world riskier. Now that you've done that step, you've significantly increased the risks in the world. Maybe that's multiple steps in the chain. But that's how I would want to think about it.

Nathan Labenz: In the calling agent case, my gut says that the folks setting up all the scaffolding and literally tying into the telephone system should have the bulk of the responsibility. The folks they're making backend API calls to, if indeed that's how it's working, maybe also should have some, but probably not as much. I guess I'm also not entirely clear how it works given various levels of competition. It would be one thing if there was only one foundation model provider you could call, versus if there are 10, or if there's one already open source. I know that frontier developers sometimes look at the open source landscape to decide what is safe and appropriate for them to use.

Gabriel Weil: Mm-hmm.

Nathan Labenz: They'll literally just, at times, be like, "Well, if there's an open source model out there that can do this, then it can't be that bad for us to release it on the API." So, I guess I don't quite know how the presence or absence of alternatives figures into this, but-

Gabriel Weil: One thing is if it's an API call, then there is a contractual relationship. So there are some terms of service that they're agreeing to when they make those API calls. So, in principle, you can allocate liability contractually that way. Another question, in the closed source context, is whether there were safeguards that the base model provider could have implemented that would have detected it was being used for this nefarious purpose and shut that down. If there are, I think the case for holding them liable is a lot stronger.

Nathan Labenz: How much does that matter if it is theoretical versus actual? If I am suing one of these calling companies and I say, "Well, I know a thing or two about AI engineering. You could've put a filter on your prompts." How much weight does that argument carry versus if I could actually go say, "Well, here's another company in the market that actually does filter the prompts"?

Gabriel Weil: You're certainly going to be in a better position if you can point to someone else that's doing it, but if you can demonstrate that it's clearly available at reasonable cost. It could be the case that no one is exercising reasonable care in some market. So in principle, merely meeting the industry standard is not evidence that you've exercised reasonable care. Failing to meet the industry standard is evidence of breach, but meeting an industry standard does not establish that you've exercised reasonable care. It could be that there's some new technique, but it's been well demonstrated and no one's adopted it, and they're all behaving unreasonably.

Nathan Labenz: It sounds like almost a new cause area could be to create product startups in all these areas that go as hard as they can on implementing all the safety standards and literally try to raise the industry standard in various niches so that there is something concrete to point at, like, "This is what well done looks like." If a philanthropist wanted to found 10 startups to do that, would that somehow invalidate the industry standard because it was a motivated, strategic attempt to set an industry standard, or do you think that would still-

Gabriel Weil: I don't think it would necessarily be sufficient to create an industry standard, but I do think if they're doing demonstrations and they're publicizing them, and it's credible that these things are cost-effective risk mitigation measures and no one's implementing them, first of all, I think they would probably implement them. If there are these demonstrations, I think these companies want to mostly be responsible. So if there are cost-effective ways to limit these risks, I think they will want to take advantage of them. But if they don't, I think even under background negligence principles, that would make it a lot easier to hold them liable.

Nathan Labenz: That's a pretty interesting idea. I think it varies a lot when you say these companies do want to be responsible. I think that describes, to a degree, that overall we're pretty fortunate about the frontier developers. My experience is that it does not describe the application layer nearly as much. There, you see some leaders that are doing a great job, and then you see a lot of things that are very small teams. Often enough, it's like, "This started as a weekend hackathon project, and then we got a little traction with it, and we decided to launch it as a business, and now it's blown up, and we're riding the wave and having fun." But a lot of them, in my experience, are just not thinking about the broader context in which they're operating or the potential for misuse or what responsibilities they have almost at all. I think it's still viewed by many application developers as a luxury to have enough time, energy, and resources to even think about that sort of thing. So there is just a lot out there that's not necessarily malicious by any means, but has been thoughtlessly thrown into the world, turned into a business, sometimes by happy accident that something got traction. But I've seen a lot of examples where that assumption does not necessarily apply at that application developer layer.

Gabriel Weil: That's fair. I was referring primarily to the frontier developers. In the context of application developers, I think their negligence works much better because I don't think what they're doing is abnormally dangerous. They're doing normal software development and need to exercise reasonable care. If they're not doing that, they can and should be sued, and I think existing law works pretty well there. However, creating this new risk, which is not well-handled by ordinary reasonable care within the narrow scope, is pushing forward the frontier of AI capabilities. That's where I think we need more bespoke liability regimes.

Nathan Labenz: Perfect transition to digging into that a little bit more. All these examples I've given you so far are mostly, I would agree, not extremely dangerous, even if in aggregate I think the harm caused could add up to something pretty significant. But we've recently got some warnings, including from OpenAI, that their next model might hit the high threshold on the biorisk dimension. For what it's worth, I personally feel like they're already there and I don't know what they're talking about, but that's a whole other topic. When I use these things, I'm like, "I don't know how you can say that this is not meaningfully uplifting people at various levels." I've had a number of past podcast guests who've come out here and said, "Here's what AI did for me in terms of accelerating... I'm an expert, a career expert, professor, tenured, whatever, and here's how much the latest model has accelerated my research and how it's, in a semi-autonomous way, come up with these original discoveries." So I see a very stark and disorienting contrast between where the companies are putting their models in their own risk assessment frameworks. It seems like everything is lingering in medium risk longer than it should and certainly longer than their successful case studies, which they are also publishing out the other side of their mouth at the same time, would seem to suggest. But okay, with that rant over, let's take the biorisk side of this. This is one of those things where you could have a near miss. Somebody... And again, you can break it down on the specifics, but maybe I ask for help, maybe I ask an agent to do something. That thing, either through me with help or semi-autonomously, whatever, we create some bio threat vector. Maybe it makes some people sick, but it fails to replicate. That seems like a fairly likely near-miss scenario. Somebody will do this sort of thing, but they won't get it quite right enough where it can actually spread human to human. So I guess for starters, is that the canonical near miss that you have in mind? Then how do we think about that playing out and how do we think about assessing punitive damage that tries to get the model developers to internalize the risk that next time it actually might spread human to human?

Gabriel Weil: I tend to think of the canonical cases as alignment failure cases. Most likely, that would be a misuse case where you could imagine an AI system going rogue and trying to create a bioweapon. So I think that would be the most core case where a system decides on its own to try to create a bioweapon, but we either catch it or it doesn't quite work. Another example I use that's similar to this is imagining a system tasked with running a clinical trial for a risky new drug. It has trouble recruiting participants honestly, so instead of reporting that to the humans it's working with, it starts lying and coercing people into participating. Then, after the trial, people figure this out, they suffer nasty health effects, and they want to sue. Clearly, we have a misaligned system. Depending on how capable it was, it could have tried to do something much more ambitious, but maybe it had poor situational awareness or narrow goals or short time horizons, so it was willing to reveal its misalignment in this non-catastrophic way. But the humans who deployed it probably couldn't have been confident of that ex-ante. So in both of those cases, those are near misses for something much worse happening. The question we'd want to know is how much worse could it have been and how likely was that, ex-ante? What would a reasonable person in the situation of the actor who made the critical decision, whether that's training the model or internal deployment or external deployment, whatever we think that the critical risk-generating decision was, what should they have thought the risks were? And what's the area under the risk curve beyond the insurability point? Imagine we think the maximum insurable risk is a trillion dollars. It's probably lower than that, but it's a nice round number. The risk of any harm, any point along that curve beyond that, the probability times the magnitude every point, we want to hold them liable for those risks. Obviously, it's going to be difficult to estimate that, but I think that's what courts should be shooting for.

Nathan Labenz: What exactly is the clinical trial example a near miss for? It seems to be a near miss for general misalignment that has gone even worse. But that's such a complex area, talking about things that are under-theorized, underexplored, and hotly debated. If you ask a judge to say, "This thing was clearly misaligned. It did some bad stuff the user didn't intend. It might have done even worse stuff that the user didn't intend," that's just such a cloudy space. How can we expect judges to map that out in rough terms, let alone condense it to a number?

Gabriel Weil: In typical cases, that's going to be a question for the jury, but that's a technical point. To answer your core question, I think a lot of that will depend on the system's capabilities. If the system is not much more advanced, and it has some basic agency but cannot do things like build a bioweapon, then maybe the risks were not that bad, depending on what they knew about its capabilities. However, if it's a highly capable system that just happened to have narrow goals that it could have tried to take on... In this case, imagine its motivation was simply to get this study run. It was highly motivated to do that and didn't have any goals extending beyond the six months it takes to run the study. But imagine it had longer time horizons and more ambitious goals. It wanted to solve really deep, hard problems in biology or science more broadly that need vast resources. If it had capabilities that would allow it to pursue those goals in ways that would be much more harmful to humans, up to and including full takeover, or even scenarios short of that, I think it's going to be difficult to characterize what that risk curve looks like. But I think that's the exercise that courts should be engaged in. I can sense in your question a callback to my original argument for liability: that we don't have to resolve these debates about how big these risks are. I agree that once you're talking about punitive damages, they have a quasi ex-ante quality to them. When we're talking about compensatory damages, it's easy to say, "You're paying for the harm you caused." With punitive damages, that's not true. You are paying for risks you took that weren't realized because we can't hold you liable when they are. The main thing I would say is that I agree it is difficult to do those calculations, but I think we're in a much better epistemic position to do that than we are for other forms of AI risk policy where we're trying to assess the risks from a wide range of systems, not one particular system, before we've seen it fail. Here, we've seen it fail in a particular way. We can do simulations and evaluations after the fact to try to figure out what it would have been reasonable to think the risks were. I don't think that's easy, but I think that's a much better epistemic position than we are in to do other forms of risk regulation.

Nathan Labenz: How do you think about this, and maybe this is also addressed by the rising tide, race-to-the-top dynamic we hope for? Why not include harmful use? It seems the companies want to address harmful use. They all have refusal training in their models. If you ask it to do something harmful naively, most of the time, you'll get a straight refusal. That's something they've obviously worked to implement. I'm with you on the misalignment side. But for these situations, it seems simpler to say it's on you. If a user comes and asks for a clearly bad thing, and the model does it, now we're in a very natural near-miss analysis, where it's like...

Gabriel Weil: Yeah.

Nathan Labenz: ...it tried to do X, it kind of sucked at it, but if it was a little luckier or a little more capable, then you would have a very big problem on your hands.

Gabriel Weil: Yeah.

Nathan Labenz: What about...

Gabriel Weil: I don't mean to say there shouldn't be liability in misuse cases. I just want to be careful about the scope of that liability. If it's misuse that wasn't a near miss for an uninsurable catastrophe, then I think punitive damages should apply. I just think the standard for whether there should be liability at all is a little bit different. There are two theories under which you could say there should be developer liability in misuse cases. One is a failure of reasonable care in a narrower sense, meaning there was some precautionary measure they could have implemented, some safeguard that would have made it refuse. If they don't implement reasonable safeguards, they clearly should be liable. The concern you might have is that even with a closed-source model, these are routinely jailbroken. So the question is, if you did all reasonable things to prevent jailbreaks, with an open-weights model, anything you do is not going to be that effective to prevent misuse. Does that mean you should always be liable for misuse with open-weights models? I think that's plausible, but I think that depends on what we think the benefits of open weights are. When you're deciding whether there should be liability in these cases where you did all reasonable precautions, I think you need some inquiry into the social value of this broader activity you're engaged in, including its risks and positive social value. Again, whether it's training the model, scaffolding it in a certain way, internal deployment, or external deployment, whatever stage we think is creating the key risks, the question should be: was that broadly a socially beneficial activity? That's not quite normal negligence. It's a scoping of the strict liability regime based on the positive and negative externalities, but that's what I think the inquiry should look like. In a lot of cases, I think that will lead to liability in misuse cases. I just don't think it can be the case that you put out a model that's generally socially useful, amps up everyone's capabilities, and also happens to be useful to people who want to do bad things, but in generically useful ways that make them a little bit better at doing bad things. Yet, on balance, it's creating large social benefits, many of which are external benefits not captured by the developer. I think their liability might produce more harms than benefits, and so that's what I'm trying to balance.

Nathan Labenz: I really appreciate your focus on this because as I'm learning about liability issues, I tend to see the upside first and it sometimes takes me longer to consider the other side of things. I firmly believe that most commercially available models today are doing more good than harm, and I don't want that to be overlooked. I appreciate that you keep bringing this perspective into the analysis. If we step back and think structurally, if all these issues were local and contained, managing them would be much easier. The big concern is uninsurable risks, like extinction events. What is the theory and current evidence that the harms likely to come before courts are actually good indicators or precursors of the risks we care most about, or those that pose the greatest threat? This whole plan seems effective if the cases going to court in the next few years truly are warning signs or strongly correlated with major risks.

Gabriel Weil: Yes.

Nathan Labenz: But if they're not, then it might not be as effective.

Gabriel Weil: Yes.

Nathan Labenz: ...trying to address these hardest-to-capture tail risks.

Gabriel Weil: I would frame that a bit differently. Not every case of AI harm, or even most cases, need to involve uninsurable risk for this framework to work. However, you do need enough warning shots relative to actual catastrophes. For example, if you're trying to internalize a $10 trillion risk but the highest insurable amount is $1 trillion, then warning shots need to be ten times more likely than actual $10 trillion disasters. If there's a 1-in-1,000 chance of a $10 trillion catastrophe, you need a 1% chance of a warning shot to capture that risk. This logic applies along the entire risk curve, so you need a sufficient number of warning shots to account for the total risk. We might end up in a situation where warning shots aren’t much more common than catastrophes, making punitive damages less effective for risk mitigation. The criterion I mentioned earlier—that you need N times as many warning shots as the ratio between the harm you want to mitigate and the maximum insurable risk—might be flexible depending on the risk abatement curve. If the actions AI companies must take aren't much more costly than their current practices, you might not need to internalize all the risk to achieve significant safety improvements. However, it’s possible that the risks are shaped in a way where warning shots are rare, so liability won’t be enough to deter companies from ignoring uninsurable risks. If that’s the case, liability alone isn’t effective. In a recent paper, I discuss the limitations of liability as part of the broader AI governance landscape. One key point is that liability can’t address uninsurable risks for which warning shots are very rare compared to actual catastrophes. In that world, we’d need a regulatory backstop. Ideally, the regulator's main job would be to determine required insurance coverage, possibly tied to a license granted when proper insurance is verified, based on the system's maximum plausible harm. The regulator should also be able to recommend, or petition a court, that the existing liability regime is insufficient if uninsurable risks are too large or warning shots too unlikely. If a system poses a significant risk, like a 5% chance of human extinction, it's uninsurable and can't be addressed even indirectly. Or, if warning shots are so unlikely that we can't manage the risk indirectly, the regulator could block the system's deployment or enforce additional risk-reduction measures. I want to be clear that complementary policies may be needed for these risks. Implementing them would be politically tough. If we're in that world, I'm not optimistic about mitigating those risks effectively, but in principle, that's the kind of system I'd support.

Nathan Labenz: We are still in the steep part of the risk mitigation curve. Anthropic recently published research and is now in production with at least one model using their constitutional classifier approach. I forget the exact number, but I believe they said it was a mid-single digit compute overhead percentage, an extra cost in terms of compute to run the classifier along with the main model. That buys an extra order of magnitude reduction, perhaps even more, in how likely or frequent it is that the system will give you some bio risky output. Interestingly, I was recently doing a charity evaluation project and running it through Claude 3 Opus. I noticed errors in my API calls and wondered what was going on. When I investigated, it was the bio-preparedness proposals that were getting flagged. The constitutional classifier was allowing the process to run up until a certain token, then truncating the result and cutting it off, likely due to a constitutional classifier intervention in the background. Of course, these systems have another cost: a false positive. I was just trying to evaluate a charity meant to address this problem, and I could not use Claude 3 Opus because the constitutional classifier was misclassifying. However, it still seems that we are in a regime where, for a single-digit percent overhead cost, you can achieve quite a lot. I am optimistic that even if the warning shots are somewhat rare or relative to the worst-case scenarios, the curve is relatively steep. Does that also mean that because they have done that, published about it, and indicated the cost, how far does that raising of the standard apply outward? If I am a Together.AI or a Fireworks.AI, an inference specialist that takes models other people have trained, like the latest LLaMA, and offers it, these companies are experts in scaling cloud infrastructure. They take the model from Meta, run the GPUs, and make it a highly scalable, fast, effective, and efficient service. Does it now become their burden to say,

Gabriel Weil: Just because someone is doing it does not mean reasonable care requires it. If everyone is doing it or the majority of the industry is doing it, that will be strong evidence that you should be doing it too. However, in general, negligence does not require you to be at the very top. There is a test, one of the more formal analyses for breach, called the Learned Hand formula. The idea is that if the burden of precaution is less than the avoidable risk, which is the avoidable probability times the harm, then you are unreasonable for not implementing it. If you can show that the cost of them implementing it would have been less than the expected value of the harm it would have prevented, and that implementing it would have prevented your specific injury, then I think you will be on strong grounds. Courts do not typically employ that formal version of the test for breach because you usually do not have the numbers needed to implement it. But that is a rough heuristic for what kind of measures you will be considered unreasonable for not implementing.

Nathan Labenz: Gotcha. Okay. Let's talk about the state laws you have been involved in

Gabriel Weil: Sure.

Nathan Labenz: writing. We are talking the day after the Senate vote-a-rama, in which

Nathan Labenz: it seems like the moratorium that was part of the one big possibly beautiful bill

Nathan Labenz: has been killed once and for all. There were fascinating dynamics where it sort of survived, got edited, whatever, and then all of a sudden at the end, I guess maybe the end, we will see. I do not want to pronounce it dead too soon because these things sometimes take on a zombie-like nature. But a 99-1 vote in the Senate to get rid of it suggests that it is likely dead once and for all.

Gabriel Weil: Mm.

Nathan Labenz: So that gives space for states to do their thing, and you are involved with a couple of states. I do not know if you want to handicap or give any analysis of whether we still have to worry about that as a possibility it might come back, but I definitely want to hear what you are up to at the state level.

Gabriel Weil: Sure. I was heartened to see the Senate last night reject the AI regulation moratorium. You would think a 99 to 1 vote would put it to bed. A few hours earlier, people were declaring defeat on this, and I was saying,

Nathan Labenz: What has surprised you about the surrounding debates to the degree that they have unfolded? That presented as very sound policy entrepreneurship, and who could object, right? But I assume you're hearing various counterarguments.

Gabriel Weil: Yeah, so we haven't seen much robust opposition from the tech industry. There's been some generic, "We don't like liability because it's going to hamper innovation," but not really engaging with the substance of the bills and the way they're structured. In this conversation, we've been talking about liability in misuse cases. I don't even think negligence law is necessarily strong enough, but the legislators I worked with made a choice to carve out misuse from any new liability—background negligence and products liability would still apply, but no new liabilities are created for misuse or malicious notification in these bills. It only covers alignment failures or capabilities failures for which a human would be liable under similar circumstances. There's even an affirmative defense in background law. If the system is substituting for some human function like driving or medical applications, and it satisfies the human standard of care, that's a defense against liability. I designed this to narrowly target misalignment risk and do it in a way that's broadly consistent with promoting innovation. In the debate over SB 1047, the liability provisions were mainly focused on misuse scenarios. Some supporters made a tactical choice to focus on those risks in making the case for the bill, and I think that was a reasonable calculation, but as a political matter, the principle that you should be liable if your system does something the user didn't intend is easy to defend. With misuse, you encounter cases like, should you be liable any time—like the electric company isn’t liable when someone uses a power tool improperly, or you're not going to hold a steak knife manufacturer responsible when someone is stabbed with their product. I don't think SB 1047 would have covered those cases or the equivalent in the AI context, but it's much easier to provoke fear in the misuse context. I actually don't think SB 1047 changed background liability law much at all, because there is already a reasonable care standard in negligence law. The final version of SB 1047 was just codifying that. It didn't impose significantly new liability, but politically, it provoked more backlash because it included that in scope. So in terms of strengthening liability law, this is the balance that makes the most sense.

Nathan Labenz: How are state-level legislators responding to this? It's striking to me that public survey results suggest broad-based support for doing something. I am sympathetic to the cautionary voice that says, "Just because people want to do something doesn't mean we should do whatever's in front of us." But it is surprising that at the national level, there doesn't seem to be much appetite to do much, and SB 1047, which you mentioned, got vetoed. What do you think are the prospects for the bills you're involved with, and more generally, what is your impression of state-level politics in this area?

Gabriel Weil: The legislators I'm talking to have been pleasantly surprised that the bills haven't received the level of pushback they expected. It doesn't look like either of these bills will move forward this year, just as most legislation often doesn't get traction, but both legislators I'm working with are excited to keep trying in the future. I'm happy to talk to legislators in any state. I'm particularly excited to work with a Republican in a red state, since I think this should not be a partisan issue. This liability-based approach is consistent with a small government way of handling these risks, which should appeal to libertarians and Republicans. I'm happy to work with anyone who wants to adapt the specifics of the legislation to their priorities and local political circumstances and constraints.

Nathan Labenz: I don't know how many Republican legislators are in the audience, but if any are listening and made it this far, get in touch. One thing I wanted to compare and contrast with is another conversation I had with Andrew from Fathom and Professor Gillian, who are behind the private governance idea. Like your set of proposals, it's both a broad framework for thinking about these issues and is starting to be included in specific legislation. SB 813 in California is one example we discussed. Both proposals take seriously that it's very hard to regulate a technology that is changing as quickly as AI right now. That's an excellent starting point for both. There's also the sense that we want the most knowledgeable people to do this thinking. Yet, the recommendations have very different outcomes. With theirs, there's a trade-off between setting up a market for regulators that companies can opt into, where regulators would be private institutions approved and reviewed by a government monitor. In exchange for opting into this regime and following best practices and standards, companies would get some protection from liability, whether that's total, partial, an affirmative defense, or presumptive rebuttal. I'm learning these terms as we go. How would you compare and contrast your proposal with this one? Is any synthesis possible between them? There seems to be much in common, but then a sharp divergence at the final step about how exactly to implement a good solution.

Gabriel Weil: Okay. I want to take this in a couple different directions. One is to focus on what I think are the strengths and weaknesses of the legislative proposal in California that Fathom was behind, SB 813. When you think about markets, they're good at achieving positive outcomes when there aren't externalities, which we've been discussing. You might worry that markets alone don't perform well for users because of asymmetric information problems. The regulatory markets idea might help solve that kind of problem. For example, if you decide which of these systems you want to use, you could choose one that's certified by what they call multi-stakeholder regulatory organizations. You would give up your right to sue if something goes wrong, but you know that going in and can choose which MROs you trust. The MROs would vouch for the underlying AI companies that they're certifying. I think that's fairly unobjectionable, and I'm less concerned about user liability for harms to users in general. It also addresses some issues like paternalism and asymmetric information. My main objection to the Fathom proposal is that their liability shield—in all legislative versions—extended to third parties. Third parties harmed by these systems would not be able to sue developers of the models if they were MRO certified, even though those non-users had no choice about their exposure to risks from these models. Because of that, the MROs lack strong incentives to worry about harms to non-users. In this market, you might worry about a race to the bottom: companies want certification for liability protection and want the standards to be as weak as possible. With users, there's some mitigation because users can evaluate the MROs and refuse to use those they find inadequate, and public watchdogs can warn consumers. So maybe that system works well enough for users. But the MRO still doesn't have much incentive to worry about third parties, who are nonetheless bound under Fathom's framework and cannot sue if something goes wrong. That's the key shortcoming to me—it doesn't protect third parties. Now, if you think the risks to users are very tightly coupled with the risks to third parties, maybe that's acceptable. I think there are two reasons not to accept that. First, in some contexts there are clear trade-offs between risks to users and risks to third parties. Consider autonomous vehicles, which may face situations where there's a trade-off between minor risks to occupants and greater risks to other road users. If you're an MRO or a consumer, you may not be very concerned about the harm to others and will prioritize the product protecting you. So there's not a strong reason to think the MRO model will protect third parties, especially since they have no right to sue if they're harmed by an MRO-certified vehicle. That concern is the core of the direct trade-off between user and third-party risk. More generally, if there are large-scale risks, even if they follow the same direction, the risks to third parties will quantitatively outweigh the risks to users. Think of problems like pollution or climate change. When I drive my car, its emissions contribute minimally to my own harm, but the broader harm is spread across billions. Most of the harm is external, so if regulations only address user risk, they're missing most of the issue. I would support the MRO model more if the liability shield applied only to harms to users. Another point: currently, negligence is the prevailing regime. I spoke with the folks behind SB 813, and one idea they were somewhat open to—but I don't think ever made it into the bill—was that companies not obtaining MRO certification could be subject to strict liability. So maybe if you combined those things—if the liability shield only applies to harm to users and if you don't get MRO certified, you’re strictly liable—that might be a synthesis we could both support.

Nathan Labenz: Yeah, that's interesting. I agree with the concern about a race to the bottom, and I was also somewhat persuaded by their response to that concern, which was basically that at some point, someone has to do a good job managing things. So to some degree, the question becomes, who do you want to trust, and who do you want to empower? Who do you think is actually capable of doing a good job? Part of their idea is that the MROs that step up should be trusted. One of the questions I've been asking people recently is, "Who is going to be an MRO if this actually happens? What organizations will step up? Do any exist today who could be an MRO?" I've asked some organizations directly and also asked others who they'd nominate. There are some interesting candidates, although still quite few. Their notion seems to be that those willing to take on this responsibility are likely to be intrinsically motivated to do a really good job on behalf of society, and they will take into account extreme tail risks in a way that maybe an insurance requirement wouldn't capture. That's just the type of people and organizations that would try to become an MRO. I think they're considering the two sides of the trade as being less directly related. In their minds, it's less about applying specific standards to reduce harm to users, and more about creating a package of generally virtuous standards meant to address issues that are hard to incentivize. And then they give this incentive to companies in hopes that it all works together.

Gabriel Weil: Yeah, so I think it depends on how lax this is. There's some government body or office. I think in their bill, it was the California Attorney General, right? Who's-

Nathan Labenz: Yeah, they wanted to see that changed actually. But yes, my understanding is it is the AG as it's written, and they were kind of suggesting it should be more of a commission or something-

Gabriel Weil: Okay.

Nathan Labenz: ... because we have the issue of what happens when the administration turns over, which we're experiencing now.

Gabriel Weil: Right, right. You can imagine that being very stringent or very lax. Let's talk about both scenarios. If it's very lax, and basically anyone who wants to set up an MRO can do it, then this market competition, which is often good, could create a race to the bottom, at least for harms to non-users. There is market feedback to prevent a race to the bottom for users, and I think that could work well. Maybe some well-intentioned people will become MROs, but they'll have a hard time attracting people because customers want the most lenient standards they can get. If the AG or whoever is responsible is lax, that's the equilibrium you'll get. But if it's more stringent, then the government is taking on a much more ambitious role, and it's not clear we're actually getting the benefits of market feedback anymore. The main point of failure is whether we're certifying the MRO. A lot of this has to do with how legible you think the safety target is. For this MRO model to make sense, the standard needs to be in a sweet spot. If it were super legible, the government could just enforce a safety standard like pollution standards for power plants. The EPA, for example, often sets an emissions limit per kilowatt hour rather than mandating specific technology. If that's possible, there's little benefit to having a private certifier. On the other hand, if it's totally illegible and we can't tell whether something is safe, then the government can't effectively tell whether MROs have good standards. For this hybrid model to work, we need to believe we're in a situation where the government can't directly enforce a performance standard, but it is possible for them to judge whether MROs are doing a good job. It's not impossible, but we don't have strong evidence that's our situation, so I have some caution about relying heavily on this model—at least when market feedback doesn't work well. I think market feedback works pretty well for users, and in principle, that could be a strong enough incentive. Most of the liability risks people worry about concern users. That's what the Character AI Act is about, and that's a major concern. So I'm not sure that's not a strong enough incentive to get companies to participate, and you then preserve the threat of liability for third-party risks. I still think that's an attractive synthesis, but as introduced in California, I think that bill was net negative.

Nathan Labenz: How similar do you think the standard-setting process would be under an insurance requirement? I could imagine, and I floated this idea to them, that if you said, "You have to have insurance," then you might hope a similar thing would happen via insurance, where the insurance companies would say... The optimistic story is like, "Well, this is obviously going to be a massive market, so we definitely want to be in it. But it's also a very tricky market because what do we know about insuring AI, since nobody's really done it? We don't have a great baseline, technology's changing, and so on." Then maybe they end up going out and contracting, basically calling in the same organizations and saying, "Hey, do you want to step up for us and be some sort of standard setter, or help us evaluate risks?" My own synthesis, which may not be right, is that I was trying to get to the point where I'm like, is this maybe two ways of creating a market where these expert organizations, whether they're serving insurance companies or serving the California AG or some commission, still might be the groups trying to do the hardest thing of figuring out what the actual risks are and what should be required of companies to get into the game. But how realistic do you think that is?

Gabriel Weil: So you could imagine insurance companies playing this sort of quasi-regulatory role. There are multiple ways they could do that, right? They could say, "We're not going to issue this policy unless you do X, Y, or Z," and they could delegate some of that to a third party that helps develop those rules, or they could develop that capacity in-house. Another tool they have is doing the underwriting, right? So they can charge you more or less depending on what safety precautions you've taken, and that can be a collaborative process where they say, "Here's our baseline premium for insuring this kind of risk," which is maybe pretty high, "but if you can show us what things you've done, and maybe it's things that we haven't thought of," right? They can work with the AI companies and say, "Okay, show us all the safeguards you've implemented. If you can convince us you've reduced the risk, then we can charge you less for this policy." So I think by default they're going to be pretty cautious. They're going to want to write policies that, on average, pay out less than the premiums, right? If we have liability insurance requirements, there's going to be a strong demand pull that's going to push up the rates, the premiums, and insurance companies are going to be in a strong position to insist that if they're going to write a policy that these AI companies can afford, that they take various precautions. So if, say, back to what you were talking about earlier, if Anthropic implements something that the insurance industry thinks offers significant, cost-effective risk mitigation, then they can say, "Well, we'll give you a significant reduction in your insurance premium if you implement that." And I think that's a pretty attractive model.

Nathan Labenz: How would... Do you have any framework for... If I'm, let's say I'm in the state legislature or wherever, and there's a bill that has this sort of private governance MRO system, and then there's a more like codifying liability, maybe an insurance requirement, and I'm thinking, "Well, geez, I don't know. Both of these proponents sound pretty smart. They both are grappling with the fact that we can't just write rules now once and for all. They're both trying to tap into the power of the market and competition, and trying to create ways for new ideas to still be able to enter even once the ink is dry on the law, but I just don't know which one is better." Asking for a friend. How would I think about deciding which of these I want to bet on?

Gabriel Weil: So again, I think you don't have to totally choose between them. I think they are compatible as long as the liability protection in the MRO model does not extend to third parties. You could also imagine there being other incentives for the MRO model. There's no reason it has to be based on a liability shield, right? So you could just require MRO certification. It doesn't have to be tied to incentives at all; it could be a stick-based approach, right? So in that sense, they're not incompatible. The only way in which they're incompatible is if you decide you want to base the incentive to join or to get MRO certified by a liability shield, and you want to extend it to third parties. That said, beyond that, I think I gave the arguments why I think, if we're talking about that version of this MRO model, why I think that's pretty unattractive. I don't know that I have much more to add to that. I think it really does fall short in protecting third parties unless you think we're in this very particular situation where the government is both able to and the politics are going to work out such that they have the right incentives to monitor these MROs and only certify the ones that are protecting third parties. I just don't have confidence that that's going to carry through, and so that's what gives me some hesitance about the most robust version of this MRO model.

Nathan Labenz: I like the synthesis ideas there. I like that. Maybe just a couple final things. I really appreciate all your time.

Gabriel Weil: Yeah, sure.

Nathan Labenz: You've certainly been very generous with your time, as I've asked many tangential and follow-up questions. In the spirit of red teaming this proposal, one question I always try to ask is, what might AI companies do differently under this regime that could perhaps even be bad, as opposed to the good you're trying to induce? The one idea I came up with was inspired by the AI 2027 scenario, which, independent of this legislation, projects AI companies increasing the gap between what they deploy and what they have internally for their own AI research. There's this idea that we're already getting to the point where frontier models satisfy many use cases. So they might, for multiple reasons, decide they don't want to tip their hand to competitors anymore. If we put this out there, someone at another company might use it for their AI research, and we definitely don't want that. So they could have multiple reasons, but this could be an incremental reason to say, "Maybe we shouldn't deploy this, and let's just keep it in-house for our own internal use." In addition to wanting to continue using the best available AIs until the singularity, I do feel like the iterative deployment philosophy that OpenAI pioneered seems to have a lot going for it. Obviously, you can overdo a good thing and not test enough, but at least the contrasting idea that if somebody develops AGI in their basement and then suddenly springs it on the world one day, that seems clearly not good. So this iterative approach strikes me as a good alternative, but loading up more liability for them in doing that could perhaps cause them to go the other direction and say, "Well, we'll just make our own bid for super intelligence, and we'll see how that goes." Any thoughts on that?

Gabriel Weil: I have two sets of thoughts on that. One is that I think the benefits from iterative deployment, at least when you're talking about misuse, should be considered. The safety benefits from iterative deployment are part of why I think pure strict liability in the misuse context doesn't necessarily make sense. One of the external benefits of deploying these systems that are potentially susceptible to misuse is that you get these safety benefits, so I think that is part of the calculus there. But I do want strong strict liability for misalignment, so I think this critique still bites. This is actually a subset of a broader concern I flag in the original paper: one potential failure mode for this proposal, particularly the punitive damages aspect, is if the most cost-effective ways to mitigate these warnings, these warning shot risks, do not actually have much effect on the underlying uninsurable risks. Then you're not getting much benefit out of this, right? So, in the formula for what the punitive damages should look like that I give in the paper, there's this elasticity parameter. Elasticity here is, for every unit of reduction in the practically compensable harm, how much risk reduction do you get for the uninsurable risk? You would want, if you have a lot of these potential warning shot cases—maybe some are warning shots, some aren't—one thing that makes something a warning shot is that it's more elastic with these uninsurable risks, right? I think what you're saying is maybe none of them are that elastic because perhaps the most cost-effective way to mitigate the risk of these warning shots is to just not deploy externally. I don't have a strong reason to think that's the world we're living in, but I can't rule that out. A couple of things to say there. One is that maybe there are warning shots that come from internal deployment. If that's the case, nothing in my proposal depends on there being external deployment. There could be misuse by internal actors. There could be cyberattacks that cause your system to be accessed by bad actors, and there could be alignment failures where someone internally using your system causes harm in the world. I think all of those would be subject to my regime. When you're talking about liability insurance, I've talked at different points in this conversation about what the key critical step is. I don't think those requirements should necessarily only apply to external deployment. If we think there are significant risks created earlier in the value chain, whether in training, pre-training, fine-tuning, or internal deployment, I think those might be generative of risks for which you're potentially judgment-proof, and you should have to carry liability insurance for. So, particularly if we're moving to a world where, and I don't know that we are, more AI companies are adopting the SSI, waiting to deploy until we have super intelligence, then I think it would be more important to have a regulatory gate earlier in the development process. I think that can partially address that concern. It might still be the case that not having external deployment is effective at stamping out these warning shots but doesn't actually mitigate the uninsurable risk. I think that's a subset of the more general failure mode for this proposal. If these warning shots aren't really correlated in the right way such that the things you would do to mitigate them do mitigate the uninsurable risk—the most cost-effective ways to mitigate them, the things that would be most attractive if you expect to pay out a large damage award—if there just aren't a lot of cases like that at all, because these things we thought were warning shots aren't really in the sense that matters, then I agree that you shouldn't want to lean heavily on this proposal. Again, I don't have strong reasons to think that that's the world we're living in, but that's the way I would think about it.

Nathan Labenz: It's a good reminder, by the way, that there is one company that has its stated plan of not doing anything until they hit super intelligence, which is a crazy world to be in, and it's crazy that it's at least somewhat credible, credible enough to raise billions of dollars, as it turns out. Fascinating stuff. Okay, two quick final questions. One, I don't know if you have anything to say about this. This may be somebody else's area, but obviously we always get these anytime we do anything that would slow down or impose additional costs or put more onus on developers: "Oh, China's not going to do that. We're just going to lose to China." One answer, of course, is: "Let's not do the bad thing ourselves. Maybe China will do the bad thing, and that would be bad, but that doesn't mean we should do the bad thing." Do you know anything about how China or other countries are thinking about this stuff? I mean, obviously it's a totally different legal environment over there, but is that something you've looked into at all?

Gabriel Weil: You sent me something along these lines, and so I did some digging today about what China's tort liability system looks like. It seems like the principles are structurally similar. They do have a civil law system, so it's more code-based, less common law-based, but the substantive principles around negligence and products liability and some narrow pockets of strict liability are all pretty similar. Damage calculations tend to be less plaintiff-friendly. So, there are what's called non-economic damages, like pain and suffering, that kind of thing, and Chinese courts tend to be less generous with those. There are also fewer lawyers there that take cases on contingency fees. I think there are some restrictions on when those are available, so fewer of these cases get litigated. If you have to pay your lawyer by the hour, you might not be able to finance a case. So I think there are just fewer of these lawsuits more generally. I don't think they have any bespoke liability regime for AI, but neither does the US either, really, so I think in that sense, they're on pretty equal footing. More generally, you don't have to assume that China is going to do the same domestic regulatory regime that we do. You can imagine some kind of framework where we encourage that. But I think more broadly the question is... First of all, this critique could obviously be brought against any domestic AI regulation. So I don't think it's... I think if anything-

Nathan Labenz: It is.

Gabriel Weil: Liability is less vulnerable to this critique because it's more consistent with promoting socially useful innovations. The other question is just how binding we should treat this threat from China. It seems like we have a pretty significant lead on China at the frontier, and with export controls, which I have mixed views on the merits of, at least in this context, it seems likely that the lead is going to widen in the coming years as China—at least until China can indigenize its own chip supply chain. They can produce some chips right now, but they don't have access to new fabs from ASML, and so I think in the medium term, that's going to be bad for their chip production, and so it's not going to be that hard, even if we're not going totally pedal to the metal in the US, to maintain a lead against China. I also am less of a China hawk than most people are. I think we share that view, and so I'm less worried about the zero-sum competition than other people are, but obviously, reasonable people can disagree about that.

Nathan Labenz: Cool. That's helpful. There's a lot more to unpack there, certainly. One quick last thing. I noticed you participated in the Principles of Intelligent Behavior in Biological and Social Systems program, also known as PIBS. I've had a few guests with, I think, very interesting, unique takes on the AI question that have come through that program. I just thought you might give a testimonial or an invitation or indicate what sort of people should be considering doing that themselves.

Gabriel Weil: Full disclosure, I'm on the board of the PIBS organization, but I will still answer this question honestly. I think PIBS has two distinctive value-adds compared to other AI safety talent development pipeline organizations. I think it wants to bet on more neglected ideas and it wants to bring in a broader suite of people with different expertise. I had been socially connected to people who are worried about AI risk, but hadn't worked on it professionally before I did PIBS. As I think I mentioned, I mostly did climate law and policy before I did PIBS two years ago, and it was very open to the expertise that I brought. I did teach tort, so I had background in liability law, but I think it was great at getting me up to speed on the technical issues and then allowing me to leverage the expertise that I already had that was relevant. I think most people who do PIBS don't do governance policy-type stuff. They do more alignment work, though.

Nathan Labenz: Not necessarily what people think of as technical alignment work; a lot of it is more conceptual. But I think it's open to a broad range of disciplinary approaches, and so I think it's a great way for people who think they might have something to contribute to mitigating AI risk but haven't seen an obvious way in. I think it's more open to different ideas, and so if you think that fits your interest, there's a fellowship that's run every summer. There's also an affiliateship program that I think is still ongoing for people that are a little bit more senior. Most people who do PIBS are grad students or post-docs. I was a more senior person; I was already a professor when I did it. There's a residency aspect to it. When I did it, it was in Prague for half the summer; this summer it's in San Francisco. I found it very productive. You're in a co-working space with other people who are working on this stuff, people to bounce ideas off of, and so I found it to be a really valuable experience. I encourage people that think they might fit this broad description to explore it next summer.

Gabriel Weil: Excellent. There's a significant need for people from diverse backgrounds with novel ideas. PIBBS is excellent for that, and this conversation has exemplified it. I appreciate your reorientation of your legal career to addressing the growing challenges in AI. Do you have any quick closing thoughts before I give you the official sendoff?

Nathan Labenz: I think I've said most of what I wanted to say.

Gabriel Weil: Great. Gabriel Weil, Assistant Professor of Law at Touro University and Senior Fellow at the Institute for Law and AI, this has been great. Thank you for being part of The Cognitive Revolution.

Nathan Labenz: Thanks, it was a lot of fun.