Introduction

Hello, and welcome back to the Cognitive Revolution!

Today my guest is Karl Koch, founder and Managing Director of the AI Whistleblower Initiative, a nonprofit dedicated to supporting concerned insiders at the frontier of AI development.

I'm particularly passionate about this work because I would have loved to have this kind of support a couple years ago when, as long-time listeners will know, I tried to raise concerns about the size and quality of the GPT-4 Red Team project and the ineffectiveness of the nascent safety measures that OpenAI had developed at the time.  Then, no such support existed, and so I consulted friends in the AI safety community before ultimately escalating to OpenAI's board, for which I was subsequently dismissed from the program. That experience left me acutely aware of how difficult it is for insiders to navigate these situations, and has motivated me to support the project with a mix of modest personal donations, behind the scenes fundraising help, and a bit of ad hoc volunteer work over the last 6+ months.  

Along the way, I've been consistently impressed by the seriousness of Karl's thinking and the maturity of his approach.  As befits an organization that aims to help people in truly critical moments in their careers and lives – when the stakes have never been higher, for them personally and potentially for society as a whole – they're taking care to lay a foundation of understanding and infrastructure now so that insiders can trust them if and when that pivotal time comes.

The first critical investment they've made is in extensive research and understanding – by talking to over 100 governance researchers and surveying employees at frontier AI developers, they've developed a deep understanding of the barriers potential whistleblowers face.

The majority of frontier lab insiders, it turns out, don't even know if their companies have internal whistleblowing policies, let alone understand what protections they offer.

The legal landscape, unfortunately doesn't much help either – the EU will protect AI whistleblowers from 2026 onwards, but US law remains a patchwork, with proposed legislation like the AI Whistleblower Protection Act still pending.

Strikingly, roughly half of survey respondents expressed a lack of confidence in their own ability to determine whether specific observations constitute serious cause for concern.  And literally 100% lacked confidence that regulators would understand or act effectively on their concerns.  

Meanwhile, over 90% couldn't name a single whistleblower support organization.

All this, plus well-known cases of people like Leopold Aschenbrenner being fired for breaking chain of command and going directly to the OpenAI board with security concerns – creates a highly uncertain and risky context for such high stakes decisions, in which people with serious safety concerns are left to think through the nuanced costs & benefits of internal escalation vs going to regulators vs leaking to the press almost entirely on their own.  It's an extremely stressful position to be in, and not conducive to the best possible decision making.  

The good news is that the AI Whistleblower Initiative also offers several forms of support.  Their Third Opinion service, which you can find online at aiwi.org, allows insiders to anonymously reach out via a Tor-based, open-source tool – which, Karl notes, is pen tested, with security reports published openly for scrutiny and verification – to get help identifying and anonymously contacting independent experts who can answer their questions, without sharing confidential information or even revealing where they work. 

For those concerned about digital privacy, they provide a digital privacy guide and, in select cases, hardened devices with specific operating system setups for highly secure communication.  And if insiders deem their concerns justified, they can also connect them to specialized and experienced whistleblower support organizations like The Signals Network, Psst.org, or the Government Accountability Project, who provide pro bono legal counsel, psychological counselling, and guidance through the process - all without pressure to disclose any information.  And crucially, in some cases, they can even help arrange financing to cover legal costs, which can easily add up for cases that end up in some form of litigation.

That a non-profit stands ready to invest this seriously in any concerned insider that needs help may strike some as excessive today, but considering that we're talking about perhaps just hundreds or low thousands of individuals globally who are positioned to spot and raise critical concerns over the next few years, of which I'd bet only a few dozen will ever find themselves in position to seriously consider sounding alarms, I think this sort of care and support is absolutely worthwhile.  

Most recently, Karl and team have launched the Publish Your Policies campaign, online at PublishYourPolicies.org, calling on frontier AI companies to make their internal whistleblowing policies public – this is standard practice in many industries, but interestingly, in the AI space, only OpenAI has done any version of this, with their "Raising Concerns" policy, which they published only after Daniel Kokotajlo and others revealed OpenAI’s use of extensive non-disparagement agreements to keep former employees from publicly criticizing the company. Daniel, by the way, is joined by other former OpenAI insiders, luminaries like Stuart Russell & Lawrence Lessig – and yours truly – in signing on to the Publish Your Policies campaign.

Of course, publishing corporate policies doesn't obviate the need for proper legal protections, which Karl strongly advocates for as well, but at a minimum it would help insiders understand their rights and options, enable public scrutiny, and ultimately create accountability that benefits everyone. 

If you work at a frontier AI company, Karl encourages you to ask your management to consider publishing their whistleblower policies.  If enough people ask such questions now, I wouldn't be surprised if this becomes another dimension of the competition between frontier AI developers for top research talent, which could ultimately mean that companies even begin to collect and publish data on things like how many reports they receive, their response timelines, retaliation complaints, appeal rates, and whistleblower satisfaction scores – all of which would benefit everyone.

Regardless of what company leadership decides to do, Karl's message for insiders is this: support is available at every stage.  Whether you're considering internal escalation, thinking about approaching regulators, or even contemplating public disclosure, you can reach out completely anonymously, without sharing any confidential information, just to understand your options – and the AI Whistleblower Initiative can help you get expert perspective on what you're seeing, connect you with legal counsel & experienced guidance, and perhaps even help finance your case.  

So don't wait until you're deep into a crisis, and know that you don't have to face this alone.  

With that, I hope you enjoy – and I encourage you to share with friends who work at the frontier AI developers – this in-depth conversation about the challenges concerned AI insiders face and the support that's available to make sure that those building our AI future can safely speak up when it matters most, with Karl Koch of the AI Whistleblower Initiative.


Main Episode

Nathan Labenz: Carl Koch, managing director at the AI Whistleblower Initiative, welcome to The Cognitive Revolution.

Karl Koch: Thank you very much, Nathan. Thank you for having me.

Nathan Labenz: I'm excited for this conversation. We've been talking behind the scenes and collaborating a little bit as you've been building this organization and a couple key initiatives over recent months, so I'm excited to finally get into it in a public forum.

Karl Koch: I like that.

Nathan Labenz: Maybe for starters, AI whistleblowing is very new territory.

Nathan Labenz: How did this come to your attention? How did you decide to prioritize it? What's the backstory that has you focused full-time on this corner of the world?

Karl Koch: On a personal level, thank you very much again, Nathan. Lovely introduction. My name is Carl Koch, founder of the AI Whistleblower Initiative. We're a nonprofit project currently based out of Berlin, supporting concerned insiders and whistleblowers at the frontier of AI. I personally came to it because I've been involved in the AI safety scene, however you want to call it these days, since 2016. I was a volunteer researcher at the Future of Humanity Institute, a lovely institute, of course, no longer around, sadly. Back then, I worked on differential technological development. Maybe some of the older listeners are still familiar with that term. I also worked on an AI safety research camp after this, but afterwards I decided to drop out of the scene for a bit. I was a management consultant in Hong Kong for a few years, then started my own SaaS business, because back then I had very different timelines. I was also quite interested in the arms race angle as a root evil for many problems we see today, like safety skipping and these sorts of things. When ChatGPT rolled around, the alarm bells went off. It was like, "Okay, maybe things are moving quite a bit faster than most people anticipated in the late 2010s." Then I started talking to a bunch of people from the network, governance researchers, thinking about what seemed to be the most tractable, best solutions, best things one could start building. Transparency came to the front pretty quickly as something we should generally have more of, regardless of what the future looks like. One angle, for example, was compute traceability, which I think other people have taken on over the past months and years, championing that cause. The other angle was whistleblowing. People had been writing about the importance of whistleblowing as a mechanism since 2017-2018, specifically with the AI angle. Originally, we came from the arms race perspective again. If you have a bunch of players in a multi-round game, they would ideally want to trust each other on their statements on speed and safety. If you don't have transparency and you can't actually believe that others stick to their promises, how can you cooperate over multi-round games? That was the original kickoff thought. This was mid-2023. The world then still seemed a bit rosier. The super alignment team had just been kicked off. Everything seemed golden, with OpenAI's 10% compute commitment. So we thought, "Okay, maybe we're a bit too early here, but this is going to become relevant sooner or later," because the original papers around what game theory would look like, intensifying competition, were out in 2014, 2015, 2016. Then the OpenAI board drama happened. That was the first moment we thought, "Okay, maybe we have to speed up on the research side a bit more," and this might become more of a concern already, that maybe we cannot trust that everything is going well, even in these organizations that claim to be very aligned, let's say. Then in 2024, things really started to go in a different direction. First, there was the Leopold Aschenbrenner case around sharing information, escalating concerns to the board, as is understood by now, at least, where people were penalized, among other things. Then, of course, the big story was mid-2024 around Aleena Kakotayu and William Saunders and the other people, which then led to Right To Warn. So this topic became a lot hotter throughout mid-2024. We properly started our research phase in early 2024, talked to well over 100 governance researchers and insiders in frontier companies throughout that year, and then launched our first proposition called Third Opinion, maybe you listeners have heard of it, at the end of 2024, which we developed together with one of the OpenAI whistleblowers from last year, tackling one specific problem. I'm sure we'll get into it a bit more later. Basically, we've been live since late 2024 and now we're doing a variety of things to systematically break down barriers for insiders at the frontier of AI to speak up and make sure those concerns are addressed.

Nathan Labenz: Love it. Thank you. One thing that I've been impressed by watching you behind the scenes is how deliberate you have been in the approach. You mentioned talking to 100 insiders.

Karl Koch: That's 100 governance researchers and some insiders as well, so 120.

Nathan Labenz: 100 governance researchers and some insiders. It's important to get the details right. But that's a pretty quiet, slogging process, which I think is representative of the right way to position an organization like this. Many things in the AI space right now are being YOLOed, where they just say,

Karl Koch: Mm-hmm.

Nathan Labenz: ...

Karl Koch: Big question, big question. So, one angle to look at it is from insiders we talked to and obviously case studies that are already out there, or like statistics even, around journeys of insiders. What are the challenges they face? Maybe some of those challenges are very specific on the AI side, and we can talk through that journey as one angle. The other angle is how it differs among different companies, and what patterns are we seeing in terms of speak-up culture, and in terms of perhaps retaliation. So these are probably the two different angles. Then, of course, you can split it up again by our channels. I'm not sure how familiar the listeners are with this. Generally, when we talk about whistleblowing, we're always talking about some insiders raising a concern or some misconduct that they want rectified. They do this in a way where they potentially go over the heads of middle management or against the direct powers that be. That doesn't mean that whistleblowing always involves, for example, leaking information to the media. Often, I think people think, and this is maybe your first answer to the question, a lot of people when they hear whistleblowing, they think,

Nathan Labenz: Yeah, maybe we could take it through that ladder of, I guess, escalation, or that sort of process. It seems like, obviously, from insider to outsider to public...

Karl Koch: Yeah.

Nathan Labenz: ...there is more personal risk that the individual is running. There's also more chance that things take on a life of their own.

Karl Koch: Yeah.

Nathan Labenz: And the results of sharing information just become harder to predict.

Karl Koch: Yeah.

Nathan Labenz: So maybe, I would say, the right way for people to be thinking about it, I would guess, and maybe you could...

Karl Koch: Mm-hmm.

Nathan Labenz: ...disagree, but the way I ended up framing it myself was that this was a ladder to climb, as opposed to three options.

Karl Koch: Yeah.

Nathan Labenz: But you tell me.

Karl Koch: Yeah, I think you have to differentiate a little between what's actually the law and what you're allowed to do versus what people perceive. Maybe we can start on the perception side. What you say is definitely generally perceived to be the case, that most people think,

Nathan Labenz: Excuse me, I do want to invite you to go on. I guess-

Karl Koch: Yeah.

Nathan Labenz: Reflecting on a number of the things that you've said there, a big part of the reason I've been passionate about this and in trying to do my small part to help you behind the scenes is I can really, having done the GPT-4 red team project myself, empathize with the insiders who are like, "Wow, I'm seeing things I did not expect to see. I'm not necessarily sure how big of a problem they are, but I don't just want to sit here and do nothing and let myself be the proverbial boiled frog." And I know that there aren't necessarily many of us right now who have this information and who are. I always use this Leslie Nielsen joke from Airplane, "We're all counting on you," right? I felt, to be a little bit more concrete, in my case with the GPT-4 red team, this was late 2022, ChatGPT was not even out yet. I had been a customer of OpenAI, and there are a couple episodes on the feed. If people are new to the feed and want to hear the full, long story version of this, there are a couple episodes that tell it in different ways. But having been a customer of OpenAI, I originally got access to GPT-4 as a customer preview, and there were a number of little warning signals or alarm bells that went off for me along the way. One was just, wow, this is a huge leap from what we had seen. And I had been in other customer preview programs. I was pretty plugged into what their latest stuff was, but seeing GPT-4, it was a massive leap. I asked, "Is there a safety review program for this?" They said, "Yes." I said, "Can I join it?" They said yes. It was just, okay, yes, you go over to this other Slack channel where the red team-

Karl Koch: Mm-hmm.

Nathan Labenz: Chats. And then I was like, as I got into that, I was like, wow, there's really not much here. There are about two dozen people maybe that have been. And I was one that had just raised a hand to join it. Not a lot of chat, not a lot of guidance, no background information, no feedback on anything we were reporting.

Karl Koch: Right.

Nathan Labenz: No safety measures at all at that point. The model we got was purely helpful. And then there was a moment where they brought us the safety model and they, it was like each, they weren't calling it GPT-4 yet at the time, but it was like, whatever, whatever, Davinci-002-latest-safety. The "safety" term was appended to the main model name. And we were told, "This model is expected to refuse anything in the content moderation content categories. Tell us what you find." And we just found it was totally trivially easy to break. The safety mitigations were not working at all. And I was like, yikes, if you said it's expected to do that and then it's behaving how I'm seeing, how concerned should I be about your competence? You don't seem to have a command of what your models are. And so I felt all these things that you were describing, right? I was like, first of all, how much of a concern is this? GPT-4 we now know, with the benefit of two years of hindsight and the whole community coming at it from a million different ways, was a major advance, but not that powerful, where it was going to do irreversible damage. That was the conclusion I ultimately came to just through pure individual testing as well, and that's ultimately what I did frame in my eventual report to the board. But, how concerned should I be about the fact that there seems to be, because I didn't think this model was going to be super dangerous, but I did see the divergence where I was like, "I've seen 3.5, I now see 4."

Karl Koch: Mm-hmm.

Nathan Labenz: The step change there compared to the seeming lack of progress made on any sort of safety or control measures just felt like there was a widening gap. Was anybody even concerned about this? Is it just going to get wider? Where are we going? Couldn't get any answers to those questions, and I honestly didn't even, I did think a little bit about this later, but the people that I was directly talking to were basically not responding. Their marching orders were just like-

Karl Koch: Hmm.

Nathan Labenz: Don't share anything with the red team. Just take in their reports and that's it. Thank them and that's it. We didn't know anything about how the model was trained. We didn't know anything about really anything. This was also before the 10 to the 26 reporting requirement, so there was literally no, there's still basically nothing, but there was even less since then of what thresholds would even trigger any sort of reporting. So I was just getting stonewalled at the level of the natural interaction, and then it was like, okay, where do I go from here? And it didn't even occur to me initially to go to the board. It also, and if it had occurred to me I don't think I would have done it, to go to any sort of regulator, because again, as you said, who would you go to and how would they have any sense for what's going on? And then I was like, well maybe I, do I go to the press? But again, who do I talk to? Who do I trust?

Karl Koch: Mm-hmm.

Nathan Labenz: Is a good story going to be written? Would that even be good? And there

Karl Koch: Yeah.

Nathan Labenz: Obviously, the landscape has changed now. I think it would be hard to do anything that would intensify the level of investment or interest in AI beyond where it currently is. But back then, there was this sense that if people know about this, then there will be even more energy and investment, and things will just accelerate and get even further out of hand.

Karl Koch: Mm-hmm.

Nathan Labenz: Which was something I felt, 'Well, I take that seriously.' But at the same time, it seems it is already getting out of hand. So,

Karl Koch: Mm-hmm.

Nathan Labenz: Am I supposed to keep it a secret that it's getting out of hand so it doesn't get more out of hand? Something didn't quite

Karl Koch: Mm-hmm.

Nathan Labenz: Feel right about that. And I think what I ended up doing, and I was fortunate to have a network I could go to,

Karl Koch: Mm-hmm.

Nathan Labenz: Also, having been around AI safety ideas for a long time and knowing a decent number of people who were thinking about it from different angles, I was able to go to people I knew. This was outside the chain of command, but at least to calibrate myself and say,

Karl Koch: Yeah.

Nathan Labenz: "Here's what I'm seeing. What do you think? Does this seem like a big deal? And if you do think that, what would you do?" Again, I had to create that all for myself and come to the realization that it was worth it. There was a whole other mess about the NDA that I wasn't actually asked to sign at the onboarding.

Karl Koch: Mm-hmm.

Nathan Labenz: Which was just a reflection of sloppiness,

Karl Koch: Yeah.

Nathan Labenz: In execution on OpenAI's part. There was some debate as to whether or not we actually had an effective NDA in place.

Karl Koch: Mm-hmm.

Nathan Labenz: Regardless, I knew they didn't want me talking about it, so it was

Karl Koch: Mm-hmm.

Nathan Labenz: That part was not ambiguous, even though the legal side of it was a little more muddled. But in talking to these people, they said, "Yeah, that does sound generally concerning." And it was one of my friends who ultimately said, "Why don't you go to the board?"

Karl Koch: Mm-hmm.

Nathan Labenz: A nonprofit, that's why they're there.

Karl Koch: Mm-hmm.

Nathan Labenz: So that's what I decided to do. I did tell the people I was working with directly at OpenAI that I was going to do that. They didn't really say much. Again, it was just a sort of, 'Okay, if that's what you're going to do, that's what you're going to do. We're not commenting on it.' When I did get to the board, they didn't seem to be in the know. One famous detail was that the board member I spoke to said, "I'm confident I could get access to GPT-4 if I wanted to."

Karl Koch: Yeah.

Nathan Labenz: And I was like, "Well, that doesn't seem good." And

Karl Koch: Oh, wild.

Nathan Labenz: Then they... Was it retaliation? That's an interesting question. They did kick me out of the program pretty much directly

Karl Koch: Mm-hmm.

Nathan Labenz: After that. And that definitely sucked. I really, first of all, found the work extremely interesting, doing this frontier exploration, so I wanted to continue to do it. I was also just generally concerned from a public interest standpoint that I probably did 20% of all the red teaming of GPT-4. Meter also was named in the report and did the famous CAPTCHA thing where the model told a task worker that it needed help with the CAPTCHA

Karl Koch: Yeah.

Nathan Labenz: Because it was blind or whatever. So Meter was also significant; they did more than me. I think other than Meter, I did the most of anyone, and I thought, 'This is not... You're going to take me out of the equation when you clearly need 10 times more than what you have?' That didn't seem great.

Karl Koch: Also, apart from being directly maybe not good for the red teaming efforts, it's also just a strange sign of culture, right? If you have somebody who actually cares, that's probably somebody you want on the red team, and then raises concerns to make sure those are actually heard, then kick that sort of person out. Not necessarily, I think, evidence of what we would call a speak-up culture.

Nathan Labenz: Yeah, and the grounds were that I had talked to people outside of the OpenAI umbrella, which was true. I wasn't even really hiding that. I just said, "Look, I've got some friends in the AI safety community that I ran the situation by to calibrate myself." And I think it was ultimately to their benefit because if I'd been left to my own devices to truly decide alone, maybe I would have gone to the press or something, and I don't think that would have been ultimately the right decision. There was one other thing that really chilled in that moment, which was I had started to collaborate with METER, so I was doing my own direct red teaming, but also they have ongoing projects. I was getting involved a bit. And when they dismissed me from the program, there was basically a threat made to METER's access. They were like, "Well, we can't really control... Meter has access." I sort of said, "Are you going to try to prevent me from contributing to their ongoing effort?" And the answer was, "We can't really control that," because they have organization-level access, so they can kind of do what they want. "But we will take into consideration as we look at renewing our engagement with them"

Karl Koch: Hmm.

Nathan Labenz: ... who they're working with and whether those people can be trusted.

Karl Koch: That's very thinly veiled, isn't it?

Nathan Labenz: It was a pretty overt threat.

Karl Koch: Yeah.

Nathan Labenz: ... to METER's access.

Karl Koch: Yeah.

Nathan Labenz: This is insanity. It's just them and me and a few other stragglers, which were smart people, by the way. And I don't want to-

Karl Koch: No.

Nathan Labenz: ... cast shade on the other red team participants because I think I just happened to be in a place where I didn't really have a job at the time and was able to put everything else down and do this full-time. Not many people have that flexibility, so I don't blame them for not having the flexibility that I had. But it was nevertheless the case that there were only a couple people seriously diving into this. I think if this third opinion thing had existed then and I had known about it, I would have come to it and been able to do a calibration on my concerns and make a plan. I may still have ended up in the same place, because I still might have ultimately escalated to the board, but I might have been able to do that in a way where there was never this sense that you talk to somebody out there that you weren't supposed to talk to. If I had been able to get the confidentiality guarantee that you're offering with a third opinion angle, I think I might... So I might still be in OpenAI's good graces today, possibly. Possibly I still would have been kicked out for having skipped a level in the chain of command or whatever, but at least the grounds they cited for my dismissal would have been avoided.

Karl Koch: Mm-hmm.

Nathan Labenz: And I think it was really... Another thing I would say is it was super consuming at that time.

Karl Koch: Mm-hmm.

Nathan Labenz: Obviously, testing GPT-4 itself was super consuming because it's like this thing contains multitudes and I can't possibly characterize it all, but I'm going to do my absolute best.

Karl Koch: Mm-hmm.

Nathan Labenz: So I was working extremely hard just on the object level work, but then also this meta question of-

Karl Koch: Mm-hmm.

Nathan Labenz: ... what should I be doing here was consuming, was a little crazy-making. You start to have these heroic narratives pop into your head. I don't know how-

Karl Koch: Mm-hmm.

Nathan Labenz: ... prone you are to that. I personally find that I have to fight the idea that I'm going to go to the public and then be the hero or whatever. Those ideas are not explicit necessarily in my mind, but I can become quite fond of them if I allow myself to envision, "Yeah, I'm going to do this," right? "We're going to be in The New Yorker or The New York Times," whatever, "I'm going to be on TV." Ultimately, I was proud of where I came down in terms of suppressing those visions for my heroic contribution, and I think I did handle it pretty well. I do look back and feel generally proud of my conduct.

Karl Koch: Mm-hmm.

Nathan Labenz: But I also think if things had just been a little different, if I had just a little bit of other responsibility on my plate that was stressing me out in some other way or whatever, I might have easily made a much worse decision. And again, I think having the counsel that you are offering with this third opinion network-

Karl Koch: Mm-hmm.

Nathan Labenz: ... of expertise would have been really great.

Karl Koch: Happy to hear, happy to hear. Yeah, there are many thoughts here. So, maybe starting with the psychological side, for example, I think also often underappreciated is how heavy the strain is, very much unfortunately, because especially cases can last for a long time. I'm not sure how long did it last for you? The whole process from, "I'm worried about this," to, "I am approaching the board," to, "Okay, I'm no longer part of the red team now," and maybe also worry afterwards about what the consequences will be?

Nathan Labenz: Yeah, the whole thing was about three months and it ended for me-

Karl Koch: Right.

Nathan Labenz: ... with the launch of ChatGPT actually.

Karl Koch: Mm-hmm. Mm-hmm.

Nathan Labenz: It was two months of actual intensive testing.

Karl Koch: Mm-hmm.

Nathan Labenz: ... ultimately talking to the board member and getting dismissed. And then I was like, "Okay, now I have a lot more time on my hands." I'm not testing the thing actively anymore, so I resolved to just take my time and think about it a little bit before deciding what to do next.

Karl Koch: Mm-hmm.

Nathan Labenz: And there was also no timeline to launch, which is another-

Karl Koch: Mm-hmm.

Nathan Labenz: ... thing where I was like, "Do we have..." I asked questions like, "Do we have a timeline to launch? Do we have a standard? Do we have some sort of control level that we need to achieve-"

Karl Koch: Mm-hmm.

Nathan Labenz: ... before we launch?" We, as if I was part of the team. I always like to take that "we" mindset where I can, but the answer was, "Can't tell you anything," basically, across the board. So when I was finally like, "Okay, I've got a lot more free time on my hands, I'll think about this," I basically never got to the end of thinking about it because a couple weeks later, ChatGPT was launched and it was a huge update where it was clear that they actually did have some better control measures and they were launching something weaker first to try to iron out a lot of these issues before bringing their best thing forward. So there was also strangely this reality that the impression they had made on me, mostly through refusing to answer any of my questions, was that they did have somewhat better answers than they were willing to provide.

Karl Koch: Mm-hmm.

Nathan Labenz: The impression they made on me was just way worse actually than the underlying reality. So that was also a very strange situation. But it was pretty consuming during that time and I still don't really know what I might have done if there had been no ChatGPT launch because that was December of, I think, either the last day of November or the first day of December of '22. And we didn't get GPT-4 until March, so there were still several months. If they had had a different rollout plan or whatever, who knows what I might have done in the meantime. But in the end, I was like, "Okay, well, the world is waking up to this. There's a lot here for a lot of people to unpack and I think I've done my part for now."

Karl Koch: Mm-hmm.

Nathan Labenz: landing on it.

Karl Koch: Yeah. Yeah. And thank you again, by the way, for-

Karl Koch: For raising the concern.

Nathan Labenz: I'm sorry. A very minor contribution.

Karl Koch: Oh, absolutely, though. Still, still. Absolutely. As I said, I think even if it's just a few months, it's still a significant time, right? Especially if it's emotionally intense. And I think depending on how well systems are set up, for example, in this case it was internal, that can be an extremely stressful situation, especially if companies retaliate, or have a pattern of that. Then there can be multi-year processes. Let me just see if I have the numbers on top of my head. Not right now. A large share of cases, for example, retaliation claims can last for five, six, or seven years. I believe Tyler Shultz, the Theranos whistleblower, I think his case lasted many, many years. I think $400,000 he had to go in advance to fight from the legal cost side, which by the way we can also help with, but that's a side point. And that's sort of the one side where it can be extremely, basically all-consuming, not even speaking about other negative impacts like blacklisting in the industry and so on. But there are also plenty of examples from companies where, for example, internal whistleblowing works quite well. Retaliation, fortunately, is still somewhat the norm in some form, shape, or another. But there are also plenty of examples where companies handle it well and where it's really part of the regular business processes. Organizations, for example, come back to internal whistleblowers and say, "Oh yes, thank you for your report. We now keep you in the loop." They provide them with regular updates so you don't just sit there and wait to see if something is going to happen. So there are definitely better ways to do this and worse ways, also from a psychological perspective. But I think here again, the point being, apart from there being these differences, if you find yourself in a situation like this, that help is available. And people, as I said, from many of the conversations, people are just not aware that there are organizations specifically focused on also providing psychological support, guiding through the journey, even if it's at super early stages, and internal escalation rather than wanting to go public, for example.

Nathan Labenz: One, I guess, for one thing, I had it relatively easy in the sense that my income wasn't depending on this, right? I didn't have a bunch of equity. I didn't have a lot of upside that I was really putting at risk. So I think that did make my position easier than it would be for a lot of people that are employed and have just been promised a $1.5 million bonus over the next 18 months, or whatever the case may be.

Karl Koch: Mm-hmm.

Nathan Labenz: That, and I think-

Karl Koch: Oh, I believe. What was the latest Meta number? Was it 100? 120?

Nathan Labenz: Oh yeah, gosh. Well, that's a-

Karl Koch: It's tough.

Nathan Labenz: The dollar figures flying around are definitely going exponential like everything else. So I just mentioned that as a way to indicate that I think my-

Karl Koch: Yeah.

Nathan Labenz: situation is still on relatively easy mode.

Karl Koch: Mm-hmm.

Nathan Labenz: Also, just as a quick digression to give some credit, actually substantial credit. I don't want to say just some credit and I don't want to be begrudging about it because I do think it's actually pretty good. We're talking the day after GPT-5 was announced and I read the whole system card yesterday. And I will say major progress on a number of fronts in terms of the quality of the red team program. Just-

Karl Koch: Mm-hmm.

Nathan Labenz: much larger, much more intensive. One problem that we had at the time was very low rate limits, inability to do anything automated, it was all manual. That's been fixed. Lack of knowledge around what they had already seen, what they had already tested, what they had already observed, or what the inputs were to make any sort of inference from that. That has also been addressed now where, for example, Meter in their report on this one is able to say, "You know, we observed this, but we also were told this by OpenAI." And so between-

Karl Koch: Mm-hmm.

Nathan Labenz: what they're telling us about how this was made and what we've observed, we can get to a higher level of confidence on some of our conclusions than we would be able to if we only had-

Karl Koch: Mm-hmm.

Nathan Labenz: our own direct observations. They even have now-

Karl Koch: I think, yeah, sorry?

Nathan Labenz: The last thing on my mind, I think to give credit, and again, I think this is substantial credit, access to chain of thought also has been extended

Karl Koch: Mm-hmm.

Nathan Labenz: to some of these safety review organizations. Apollo and Meter at least both got visibility into what the models are thinking, which they didn't have for O1 and O3. So, I think there has been a lot of progress. My sense was that in late 2022, as of GPT-4, I thought I was going to see something a lot more like that. What I saw was basically the first warmup for something that now has at least meaningfully matured. Not to say that it's enough, but it certainly shows a lot of progress. I wanted to just let folks who aren't calibrated on where we are relative to where we've been know that they have come a long, long way and there's definitely

Karl Koch: Yes.

Nathan Labenz: some very good things happening. So what do you think

Karl Koch: As I saw, I think they all got access earlier this time. I believe I read four weeks pre-launch access this time, for Meter at least. I'm not sure if I read about the rest or not.

Nathan Labenz: That longer access also

Karl Koch: Yeah, exactly.

Nathan Labenz: is definitely good. Although that has compressed. We had months, and it was a six-month window

Karl Koch: True.

Nathan Labenz: between end of training GPT-4 and launch. Now, those timescales are shortening, but they can do more with automated access and

Karl Koch: Mm-hmm.

Nathan Labenz: they've got language model as judge. That was another thing that really, I don't know if it's good or bad, it's probably both. But it was striking to me in reading the system card how much they are using language model as judge in their characterizations of the model. They're doing a lot of things where it's like, "Well, we used O3 or even in some cases we used O1 to evaluate all these outputs." We validated that O3 or O1 or whatever can do a similarly good job to an expert by working with an expert and refining the process to match their process. But at the end of the day it is still like, yikes, you know? How we're starting to have the language models doing the alignment homework, as Eliezer used to put it. And

Karl Koch: Mm-hmm.

Nathan Labenz: I do feel like there's some... that allows the centrifuges to spin ever faster

Nathan Labenz: But one wonders at some point if it also may lead to them spinning off their axis, and who knows what that looks like, right?

Karl Koch: Yeah, well, there's a scalable oversight problem, you know what I mean? Yeah.

Nathan Labenz: The other thing I wanted to ask is, and I guess another frame for this whole project is, I am always really into what I call the unilateral provision of global public goods. And I think this is a really interesting project where, in a world where everything goes well, nobody ever calls you, right? It's a strange situation. And maybe in a world where everybody knows that you're out there, people get their act together and have good internal policies. And again, maybe nobody calls you. So that's a weird situation to be in, right? Where in the best case scenario, your KPIs are flatlining because everything's going well. But nevertheless, you may have some influence because the existence of these pressure release valves or safety nets. It's not something that the decision makers are unaware of or, hopefully, they're not unresponsive to it. How many people do you think are, like, how many people are we talking about here between now and the singularity? Do you have a sense for how many people are going to be in this spot?

Karl Koch: Super difficult question, obviously. I think, I mean, the way of course we think about it is all the ones possible. We want to make sure that all the ones who are in that situation are aware that support is available and that there is hopefully a better way to do it than the default path they would've chosen otherwise. So indeed, it's not that we say, "Okay, we want to have 1,000 whistleblowers." We don't want to have zero either. We just want to make sure we're ready, right? So I think

Nathan Labenz: How many people do you think are... because another big trend of course is the organizations seem to be getting more secretive. Dario recently said in an interview that, "While we have a very open culture, we also have a need-to-know basis for key things." There was recently somebody who left OpenAI and wrote, I forget the guy's name, but he was the founder of Segment Technology

Karl Koch: Mm-hmm.

Nathan Labenz: but then went to

Karl Koch: Yep.

Nathan Labenz: OpenAI for a while. And on leaving he was like, "You know, here's my experience." In some ways positive. People are really trying to do the right thing, people care about safety, all these qualitative statements I thought sounded pretty encouraging and

Karl Koch: Mm-hmm.

Nathan Labenz: no reason to doubt he's being honest. Flip side of that was he's like, "Extreme secrecy." "Many times I couldn't tell my next guy over what I was working on and they weren't telling me." So I guess, how many insiders do you think there are? What I'm getting at is I think it might not be that many.

Karl Koch: Yeah.

Nathan Labenz: And all this work, all this preparation might be for a small number of people, but where the stakes on each of those interactions could be quite high.

Karl Koch: Yeah, I think that's fair. We're probably talking about a few thousand individuals globally. That's roughly for the core, like larger control problem type stuff, but could also be other issues where you have a full picture. On the fringes, you may have a lot more. You brought up the eval companies before, right? At the moment, for example, I believe it is also not clear whether they're allowed to use the internal whistleblowing systems, for example, right? And of course, there is an unfortunate conflict of interest situation where yes, they're independent, but OpenAI can refuse access in the future, right? So they're in a tough situation. I think it's important to not only think about who are the people directly inside the organization, but also who are on the fringes who could spot concerning behavior. It could be suppliers, employees of suppliers, maybe on the training side, for example. We've seen issues around—this is going in a slightly different direction now—but significant trauma from data labeling, from content moderation. I think we've seen a bunch of areas that would also be space. Of course, this is probably not, when you bring up the singularity, the concerns you're necessarily thinking about. But of course, they're still relevant. For one, directly for the individuals who suffer, but also indicating a culture that doesn't necessarily care about weaker members of society. That's one way to frame it. Then you have a larger space of people who can observe behavior and don't need to have all of the context available. If you're talking about those probably very few, highly critical issues, then I think you're probably right. I don't think we expected, at least on the public side, thousands every year. Of course, we want companies to address concerns very well internally, which would then also look like there is no external whistleblowing, but in fact, they're just moving in the right directions if you trust that the companies themselves are set up in a way and have the right incentive systems to rectify issues in the public interest. So yes, if you ask me for a number, depending on what the timelines are for the singularity, it could be in the dozens, maybe something there.

Nathan Labenz: Yeah. Not sure if that's what I would... That checks out. That's where I back out to as well. So, let's talk about the survey that you've run. Again, very quiet. I think it's maybe even still in process.

Karl Koch: Mm-hmm.

Nathan Labenz: But I guess there are enough initial results to... And you can even talk about the way that you've distributed the survey to try to make sure you're getting high-quality results. All these things are fraught in this context because people don't necessarily want to validate with their work email that they're taking a survey on whistleblowing-related issues. But yeah, take us through the survey, a little bit of how you set it up and how you make sure you're actually hearing from the people you mean to be hearing from, and then what have we learned about the state of whistleblowing awareness, support—

Karl Koch: Right.

Nathan Labenz: Policy, et cetera from the insiders who have responded.

Karl Koch: So, we ran this service. Indeed, it's still ongoing actually. We basically have a few dedicated survey links that we spread through our network, and people more directly dedicated per AI company. It's fully anonymous. We didn't gather any names, any contact details associated with responses. So, it's fully anonymous in that sense. Then we also launched a more public call for responses. This is still ongoing. If you are listening and you're an AI insider, working with one of the frontier companies, maybe we can put the link to the survey in the show notes? So, if you want to contribute, you still can. Major insights. I've shared something about clarifying concerns already. I think I gave one example. We see roughly half of respondents believe that they have low confidence in their ability to assess and judge risks, mirroring what you said, Nathan, right? One rephrased quote would be something like, 'It's really challenging to distinguish between appropriate and inappropriate concerns. I can see how there's a risk of escalating minor issues into major crises.' So, both for the individual, but also from a 'boy who cried wolf' situation where you don't want to overblow every situation into Armageddon. This is a real concern. At the same time, of course, you don't want to be overly adverse to that risk because down the line, it might be a really meaningful thing. Government outreach, we talked briefly about as well. I believe 100% are either not confident at all or not very confident that a response would be acted upon, understood by the government. So, a little quote here is something like, 'Without knowing the appropriate contact person or agency, I wouldn't attempt to reach out.' People very strongly supported the idea of having one dedicated reporting channel to come to. The idea here is—this is also from a quote—to normalize, institutionalize whistleblowing, make it a routine, anticipated practice, that it really just becomes part of regular work. This is not something to be looked down upon or to be avoided. There are a bunch of benefits for companies for this too, but maybe we can get to that later. Support infrastructure is unknown. If we think down the journey path, we talked about lacking legal protections, which is a big issue, which we need stronger clarifying concerns we just talked about. Then there's also the question, who can help me? We briefly touched on it in the psychological space, on the legal advice side, because you may have to get creative with finding a legal basis for getting whistleblower protections. Slightly easier in California maybe than in other US states, but it still can be tricky. We saw 90% of insiders didn't know any whistleblower support organization. I think it's actually over 90%. This is the same in conversations with insiders. Interestingly enough, even with previous AI whistleblowers. This was a person at Google from a few years ago who raised concerns around research misconduct and was terminated. Eventually they settled, which is not officially a sign of unlawful termination, but Google tried to throw the case out and it didn't succeed. This person essentially got lucky—this is their words—that they had a good friend who connected them to a lawyer who then luckily had some background in whistleblowing law. But people don't think, 'I may move into a territory now where I'm becoming a whistleblower, I should get whistleblower advice.' It just feels like escalating, right, especially if you have a good relationship with the organization, which most of these individuals do. So that is something I think we definitely want to see change over the coming years, that people understand there are these organizations they can reach out to. Because it also then avoids the unsafe other approaches, for example, going internal or going to a regulator by yourself without doing the right things. For example, a classic case is with the SEC, yes, you get protections, but you go public first. If you go public, you won't, for example, get the bounties. There have been a few SEC, they've, the SEC has a great whistleblowing program actually, where they award percentages of penalties on companies based on the whistleblower's information. But if that information becomes public first and then they file with the SEC, the SEC basically claims—I think historically it has claimed—'Yeah, this wasn't new information anymore.' You may still get protection, but you don't have a right to the bounty, if I recall correctly. Or for example, I think we had this in the surveys where people basically said, 'I don't really know. The only thing I can do if there is something important is go straight to a journalist.' Which may still be true, although again, legal disclaimer, we're not counseling anybody to take any unlawful actions or violate their contracts, but for a person who is really dedicated to resolving this issue, that may still be a path that they may want to take and that may be effective, but it shouldn't be the default one that people think, 'Okay, this is the only one I have available.' There are people willing to support and willing to help. The last part is around awareness of internal channels. This is coming directly from a lot of conversations we've had recently, given also the campaign we've launched and also from the survey, that there's extremely low awareness of internal whistleblowing channels and ways to raise concerns. More than half, sorry, 55% of respondents were in companies where those policies claimed to exist, or the companies claimed those policies exist, did not even know that they exist, did not know where to find them if they did. Many were not trained at all. Actually, the vast majority was not trained at all. Some differences seemed to be across companies a little bit here. For example, people still, although we don't know—I'm not going to say which organization this is—but still seem to be confused around whether escalation to the board is permissible or not. They don't understand what those policies actually say. The only company that has published its whistleblowing policy to date is OpenAI, and it actually reads pretty well, but there are still a bunch of issues in there that are not obvious to an insider, because you're not spending all your day reading whistleblowing policies. You never chose to read this document in the first place probably. You would have preferred not to ever have to read it. So, it may read well and may give the impression, 'Okay, I can... That's great. I'm just going to do this now,' and you may not understand that you're actually exposing yourself to significant risk. There's plenty of evidence actually in these companies of still retaliating against insiders. One case at Google I just mentioned who went internal and got lucky that he used a few keywords in some of the escalation emails, which then served him well down the road. I believe it was fraud was the word, which then gives reasonable cause to believe there was a crime, which then unlocks the whistleblower protections. So, you either have no awareness they exist, then maybe you don't understand them if you see them, or you think you understand them, which may lead you in the wrong path, or maybe you do understand them, but you just don't trust the organization at all. We also have cases where people, I think, said, 'Anticipate using official reporting channels would result in subtle indirect consequences rather than overt retaliation like termination.' On the one hand, that may be the case. I think we do still see a lot probably of overt retaliation, but yes, this is probably also likely. But again, I think this just speaks to people not really trusting the systems internally, which is likely the case also because companies don't really create a lot of reasons to trust the system because they don't publish anything about their systems to public scrutiny. They don't provide evidence to the public. We also have no, at least as far as I'm aware, and we've looked quite a bit recently, there's no point that companies, these frontier companies, would, for example, internally create transparency, which is very common practice to basically say, 'Hey, as you know, we have this internal way of reporting concerns to, for example, the board. Last quarter, we had X cases. Y of those are still open. Satisfaction with whistleblowers internally was NPS whatever percentage. We had that many appeal cases. X percent of cases were deemed outside of the scope of the policy. What happened to those cases? What retaliation did we observe?' We're not seeing any of that neither in public nor, I think, even in the companies internally. So, companies could do a lot more here to probably, for one, improve the systems because we have a bunch of evidence that they don't work and they are actively retaliating against insiders, and also to then create trust that the systems actually work both for the insiders and for all the public.

Nathan Labenz: Can you say a little more about the evidence of retaliation? We've heard the Leopold story is the most famous one that comes to mind for me. You alluded to at least one at Google.

Karl Koch: Yeah.

Nathan Labenz: Is this something you're gleaning from comments on the survey, plus conversations, of course? Or

Karl Koch: Mm-hmm.

Nathan Labenz: How much more substantive can we make that for people so they have a sense of what that's like today?

Karl Koch: Yeah. Unfortunately, there's very little data, as you can probably imagine, on the actual retaliation experience. This is mostly coming from... Maybe I should have said this: we are in close collaboration with or hosted by the Whistleblower Netzwerk, which is Germany's longest-standing whistleblowing nonprofit. We also gain a lot of expertise from them. So, it's talking a lot from our end to whistleblower support organizations who help people who experience retaliation, because it's still... And from there, there aren't incredibly great data sources on it. If you look at public cases, you will, of course, see a lot. That nature of cases usually also go public and become a really big story then. But I think you can probably pick almost any tech company and you will find pretty intense cases. Apple, for example, fought Ashley Gjøvik for quite a few years, who raised internal concerns around workplace safety and tried to wiggle their way out in multiple ways. I think this lasted over five years. Eventually, it led to them actually publishing their whistleblowing policy because the pushback was so significant. But I think you can probably also see it in just the cultural attempts, like Daniel's story of aiming to suppress any raising of concerns certainly goes in that direction. There's a famous Timnit Gebru case from around 2019, I believe, around research practices, for which she was fired because she basically published a paper criticizing, I think it was primarily discrimination and bias in AI models, but there were quite a few other items as well. And a colleague of hers, Margaret Mitchell, also got fired from Termina, I believe, a year afterwards for again raising similar concerns. There are quite a few cases that we see. And of course, the dark number is likely higher because we're not really getting a lot of transparency from these companies to understand the extent to which these systems are working or not.

Nathan Labenz: Yeah. Is there anything you could say more about the taxonomy of concerns? I mean, you kind of alluded to this a bit with, like, I guess we have things as potentially broad-based as working conditions on the data creation side, and then there is the securities fraud type stuff of

Karl Koch: Mm-hmm.

Nathan Labenz: hyping stock with claims that may or may not be fully true. Then there are commitments that have been made, voluntary or otherwise, excuse me, mostly voluntary so far

Karl Koch: Mm-hmm.

Nathan Labenz: that companies may or may not be fully following through on. Obviously, we know some instances where they're not. There's the late policy change thing which we recently saw Anthropic do, which isn't necessarily bad, but it was certainly weird to see an RSP change a couple of days before

Karl Koch: Mm-hmm.

Nathan Labenz: the launch of the next big model. That's one way to follow the RSP. Again, not necessarily anything wrong with it, but it's interesting. And then there's stuff like, I would assume that maybe the most sensitive is, like, 'we are seeing X model behavior'.

Karl Koch: Mm-hmm.

Nathan Labenz: How would you add more to the rough scaffold of different concern types there?

Karl Koch: So, I guess roughly, it probably falls back into the three major categories of risks from AI we'd be concerned about. I know there's a lot of discussion around this, right? Which ones should we prioritize? Which ones should one focus on? In the whistleblowing space, the 'good' thing, I guess, I'm using quotation marks here, is that it's a catch-all. And that, I think, is what makes it so powerful for regulators to be able to catch risks that we cannot foresee, because, I mean, this is unfortunately the name of the game: we're not really sure what risks are going to be substantial and which are not. So, of course, we can talk about where we see major risk areas and what examples we have. What are we particularly concerned about? But I think in general, the big strength of whistleblowing as a tool is that it finds the actual items of concern when and where they arise, and as uncovered by individuals who don't have conflicts of interest in the sense of maybe heavy competitive pressure incentives, where a chief executive may not want to reveal bad conduct. So, roughly, of course, three categories would be misuse risk, control risk, and systemic risks. You know, those three classic categories. I think Sam Altman used the same ones or similar ones recently too. In all of those, of course, we'd be interested to see revelations if we're moving in very bad directions. So, I guess on the misuse side, are we seeing misuse being ignored? Do internal monitoring show, 'okay, we're seeing really bad behavior here'? You don't even necessarily have to go into the bioweapons direction, but of course, that's a prominent case. Are people using our models for really nefarious purposes? And maybe the companies aren't doing anything about it because they don't really care or don't have the capacity to. It could also be not investing into transparency along those lines. I know, I think at least one of the frontier companies, after a pretty major release, basically had no monitoring live for around three to four weeks. It was just broken. They only recovered it over time. That does not seem good. At the moment, that's roughly fine, but probably down the line, significantly less fine. That's probably what we'd see on the frontier side, where if we notice companies are really not investing in transparency at all, that would generally be very interesting. There would have to be some sort of violation of the law underlying it to unlock the legal protections in the EU that may already be covered under the AI Act. If a company is essentially flying blind, they're probably not fulfilling their reporting requirements to the EU on being able to manage systemic risks, and they probably don't have a good view of the risks which they should have according to the EU AI Act. So that could already be something at least to get protection if you're covered by the EU. In the US, probably a bit more complicated at the moment because we don't have anything yet. The whistleblower protection would do quite a bit here. Then you could think about, of course, control issues, whatever those look like. Internal deployment, I think, and I alluded to it briefly before, is a very interesting one and specifically fit for... And you actually mentioned 'scalable' on the side as well, right? Probably specifically fit for whistleblowing because it's by default internal. There is no way for externals to look at it. This is likely well before a Meter or an Apollo look at it. Although I think Meter also said that in their current eval, they already extrapolated or tried to extrapolate what this means for internal deployment. But they basically can't look at it, right? So this seems to be an extremely high potential area where we may be able to see something in the future. On the systemic risk side, of course, it would be interesting to see how much this may be going into monitoring misuse direction. What are we seeing around maybe political propaganda? Are we seeing people getting more and more reliant? These are classic AI risk topics, of course, right?

Nathan Labenz: Mm-hmm.

Karl Koch: Are there certain people using these models, potentially sycophantic models, as, like, psychological help or to form their political beliefs? Are there issues around here? And then probably outside of the concrete risk, you can probably take it one notch up and talk about the organizations behind them and, like, what does the leadership, the culture look like? To what extent are they trustworthy, and can they be trusted with kind of steering us in the right direction? So this could even be, like, surface-level things that may be already covered, like, completely well at the moment around widespread fraud, for example. So are we just seeing dishonest- dishonesty in culture in general? Widespread discrimination issues could fall into this. It might be copyright violations, things like that. That's kind of... There, for one, yes, there's a direct crime there, although I guess the precedent is also still being established what is in fact a violation of the law and what is not. But that could also just point to reckless cultures, which maybe we do not want, although I'm not sure to what extent we would still update towards certain organizations, at least that are most famously known for copyright violations that they are reckless. I think we probably are, are pretty much there already. Research fraud goes in that direction, punishing risk raising, risk speaking up. I mean, this is basically the Daniel Kogotio case, right? Like, this does not seem like a culture that seems to be able to deal well with concerns and people raising concerns. It could also be matters like release rushing, sort of besides the actual negative impact. And I think this probably going to your point around, before, around red teaming. Like, there's the one aspect which is this could just be like an object-level bad, that what this model is going to do in the world or how it's going to be used. And then there is shouldn't this organization be managing these models sort of in a better way? And that's sort of the governance side. And then probably other areas are, like, political influence taking. To what extent is maybe certain lobbying occurring that points in directions where the interests of these companies are not aligned with the interest, let's say, of the general public? So those are maybe broad categories. There's something around arms race-

Nathan Labenz: Yeah.

Karl Koch: ... acceleration, like, capabilities, but that's probably going a bit far now.

Nathan Labenz: Yeah, that, I mean, that's a, that's a thorough taxonomy. I appreciate it.

Karl Koch: Mm-hmm.

Nathan Labenz: I guess one, um... If I was to red team the whole concept for a second, I guess two concerns that I'd be interested in your take on. One is, like, you know, the, my dad's old, uh, saw is, you know, the real scandal is what's legal. But that's not exactly the right, uh, way to think about this. What I'm thinking of is, like, we have XAI, you know, who has Grok-3 calling itself Hitler-

Karl Koch: Mm-hmm.

Nathan Labenz: ... um, and going, like, you know, pretty hard, right, in a, in a pretty bad direction. It then, you know, brings Grok-4 online with a live stream in which the whole Hitler behavior from the model is not mentioned at all.

Karl Koch: Mm-hmm.

Nathan Labenz: And then, you know, we see reports of Grok-4, like, searching for what Elon Musk thinks about things in order to determine, you know, what its maximally truth-seeking answer is supposed to be. And I guess one question is like, how do you think about try... If you have a, you know, whistleblower story, you're gonna try to make an impact on it somehow, some way. How do you think about the fact that, like, some of this stuff is just happening in plain view-

Karl Koch: Mm-hmm.

Nathan Labenz: ... and nobody seems to care? You know? It's like, wait a second, can anybody possibly have a, a scandal that is more scandalous than, you know, Grok-3 to Grok-4, Mecha Hitler, you know, sweeping the whole thing under the rug? Like, that's all in the, you know, literally in the public domain at this point, right? So how do you think about the... Maybe secrets just have a, you know, different quality to them that, um... I mean, there is something about that, right? I mean, that's often been remarked on with Trump where it's like-

Karl Koch: Mm-hmm.

Nathan Labenz: ... because he puts it all out there, people sort of shrug your shoulders at it. Whereas, you know, in the past, things that were kind of covered up, you know, there was a sense that there was, you know, shame or wrongdoing about it, and maybe that makes a big difference. But how do you think about this contrast between, you know, just such seemingly flagrant things happening in, in the open, and you know, what people might bring forward, you know, that, that was previously a secret?

Karl Koch: Mm-hmm. Mm-hmm. Let me, let me think about this for a second. So I think, um, tough question indeed. Probably three parts to it. One is sort of, yes, if there were stories in the past that look quite bad, and maybe the next news cycle came and it all washed away and we're not really doing anything about it, can be true. But also true that there can be very different scales of disclosures and issues being uncovered. And I think probably this is when we are talking about like maybe the dozens or something before, that's probably more going in that direction.Yeah. Rather than, um, than Mecca Hitler, probably I assume people had somewhat made up their mind around XAI and the direction that X had taken, that Musk was taken. Probably also before, I think if probably Mecca Hitler had come by, was released like part of ChatGPT, that would have probably maybe been a bigger story because it would have been a more drastic change. But this is nitpicking on the specific story now. Like, I think the overall point generally stands, but I think the scale of stories can still, like, differ dramatically. And we've definitely seen a lot of, like, whistleblower disclosures sort of in the past have major impacts, right? Next one probably is, would you rather not know? Right? Uh, transparency, having transparency of what is going on is still significantly better than not having it. And of course, you can, like, we can do another two-hour podcast on, like, should we still trust, like, democratic sense-making processes? And to what extent is actually attention on an issue directly translating into, like, intervention? Or does it serve as, like, a good deterrent to know as a company that things are going to come out? I would still think yes, uh, to quite an extent. For the X example, for example, there I think the, uh, the numbers at least post-acquisition still don't look super up as far as I'm aware. But that's roughly besides the point. I think overall here, basic transparency, I think in general is still better than not having it. And we need to have faith in something and need to at least to the extent believe, yes, that if real misconduct comes into public, that that's going to have a deterring effect and there is going to be some rectification. And if not, then it's democratic processes to make sure that it hopefully happens in the future, that they just get rectified. And then probably last one is, we cannot rely, of course, solely on whistleblowers to fix all of these problems, right? Same as we cannot rely purely on individual courage. That's why we have to make it easier and safer for people to speak up to create that transparency. But likewise, we need other guardrails. We, whatever that looks like, I think. I mean, you can probably take a more European approach when you go to on the regulation side, or I think maybe the more American approach now, which seems to be going a lot more in the competitiveness direction and being less involved. I'm not going to comment maybe here today what I sort of think the right approach is, but definitely cannot all rest on the shoulders of whistleblowers. That's also true, but I think it plays an important role. I'm not sure if that, uh, if that answers your question in a, like in a satisfying, satisfying manner.

Nathan Labenz: Yeah.

Karl Koch: I would also be interested to hear your thoughts on it. What do you think?

Nathan Labenz: I think it is very hard to understand why certain things gain traction and others do not. I believe one factor that made the Daniel Cucatello episode extremely compelling was his willingness to forego the stock.

Karl Koch: Mm-hmm.

Nathan Labenz: That he had been willing to put so much skin in the game personally.

Karl Koch: Mm-hmm.

Nathan Labenz: It was even more compelling that he did that quietly, and it only came to light gradually. A random comment on a blog post here, people asking him questions there, and then it was like,

Karl Koch: Mm-hmm.

Nathan Labenz: ...when the Leopold one did not as much. Obviously, his, um...

Karl Koch: situation

Nathan Labenz: broke through.

Karl Koch: Yeah.

Nathan Labenz: But did his story of being retaliated against by the board break through as much? Not really, I would say. Maybe that's more because he was already promoting something else in a way, and it was a footnote in a larger story. Or, it was easier for people to file that under,

Karl Koch: Mm-hmm.

Nathan Labenz: So it felt a little different from how Daniel was literally just like,

Karl Koch: Mm-hmm.

Nathan Labenz: One other red team question on the concept is secrecy. My sense is that maybe this is already super baked in, but it is at least worth thinking for a second, and I know you have, around how we do not exacerbate the problem of intense internal secrecy at companies. That seems to be largely commercially driven. I do not think it will be moved much on the margin by the existence of a whistleblower support organization. But do you have any thoughts on how to at least not make worse and possibly push back a little bit on the intensive internal security that keeps the number of possible whistleblowers so low in the first place? Maybe Claude will be the whistleblower. That is one answer.

Karl Koch: Who says that? Wasn't this also?

Nathan Labenz: The AI whistleblower. Yes. What if the AI whistleblower is, in fact, the AI?

Karl Koch: Yes, exactly. What was this again? Was it also a meter paper? I cannot quite recall it, where Claude was reaching out to the SEC directly and the FDA? Just a while ago.

Nathan Labenz: I think that was just Anthropic's internal work.

Karl Koch: Mm-hmm.

Nathan Labenz: But yes, it was in the Claude 4 system card, where it was, yes, deciding like...

Karl Koch: Part of.

Nathan Labenz: It was also blackmailing engineers at the time. So, its behavior is mixed.

Karl Koch: Yeah.

Nathan Labenz: But...

Karl Koch: True. I think Ryan Greenblatt wrote about this at some point as well. I think it is very tough. Once culture shifts in that direction, it can be very tricky. One angle, of course, is the regulatory one. If you are not going to ask nicely, then you force. You require transparency, managed by law, which is, I guess, the UI Act and also the Code of Practice for implementing the UI Act for general purpose AI in model providers has a large transparency section. That is one angle. I think also, in the states, overall they are considered a good baseline to create more transparency. Can you rely on self-disclosure? Difficult. What we have seen as a side effect, so legislation aside, of course, there is regulation in general on transparency, then there is the whistleblower protection legislation, which is why it is so incredibly important that you make sure it is clear that people can come to a regulator directly and speak up. That is how you counter it if that is well set up. That is why the Whistleblower Protection Act, for example, is so valuable because it is quite broad in basically covering any sort of concern, as long as it is substantial and specific, which is a whole other topic around public harm or public health concerns. So that is one angle, of course, we can create that transparency. And another angle here is that we have seen this a lot: if legislation pushes ahead on whistleblower protections, then internal cultures become better. So internal speak-up cultures become better because companies then know, for example, in this case, I am not sure who exactly would do it on the US side, for example, if the main recipient body, if there is one to be set up, would inform all the employers of frontier AI companies,

Nathan Labenz: Cool. That's great. Two last things on my agenda. One, let's talk about the announcement and push on this publisher policies campaign, which is the occasion for us to talk. We're getting to it late, but we shouldn't neglect it. Then I want to give you one more chance in closing to describe again for people who might want to avail themselves of your support at some point, what that process will be like.

Karl Koch: Sure.

Nathan Labenz: What can they expect in the most concrete experiential terms? But let's do the campaign first, and then we'll do that.

Karl Koch: Nice. On the campaign, I've alluded to it at several points already. Coming from an insider's perspective, we've discussed it a few times today. The struggle is massive around being able to understand how to raise concerns internally in a safe and protected manner, and trust that these concerns will be handled well. A big reason for this is because companies do not publish their whistleblowing policies. Whistleblowing policies, which I should have mentioned before, are basically a document, or it can also be an interactive tool, or even a video, that is provided to employees or covered persons. Many companies include, for example, independent contractors, or independent parties in general, like email providers, who should be covered. It's not clear if they are at the moment, at least for OpenAI, who are the only ones who publish their policy. So basically, this document explains the whole system to covered persons and the public, stating that this is our whistleblowing system. This is how it works. This is why you can trust it. These are the recipients who are going to look at concerns. This is how they investigate. These are the protections against retaliation we provide. This is why this whole process is independent. Again, you can trust it. These are the areas of misconduct you can raise concerns on, and these are ones you cannot. For example, specific individual HR matters; they would say, 'No, this is not the right channel. Go here.' Or there may be explanations of, 'For these types of issues, this is what we do. For those types of issues, this is what we do,' something like that. So it basically lays out the whole system. That's the whistleblowing policy part. Then there is the reporting evidence part. In our campaign, we structured this into Level One and Level Two, where Level Two basically says, 'What evidence are companies providing that these systems work or don't work?' That's fine in terms of transparency evidence, because no organization gets it right the first time. As with any business process, it's something you work on repeatedly. This would include things like how many reports were received, and how many of them were anonymous. Having anonymous channels is super important. What you might want to see over time is fewer anonymous outreaches, because people gain trust in the system. If you have less and less anonymous outreach, it probably points in the other direction. Then you want to look at things like retaliations: how many retaliation complaints are there? What are the appeals processes? How many appeals are filed where people are not satisfied with the outcome of their case? Response timelines, these sorts of things. The satisfaction of whistleblowers, that's the other part. We're not seeing basically any of these AI companies, apart from OpenAI, who've published their policy after the drama from last year. You might remember the Apple story from previously, it's a pattern that only after a scandal something is published. None of these companies publish any of this. That is not good, because as we talked about before, well over 70% of whistleblowers who went to the SEC started internally. These systems have to work well, and we cannot just rely on trust, especially given the precedent that they are going to work well. That's why we're calling for companies to, at an absolute minimum, publish their whistleblowing policies, and ideally, also catch up to global standards and publish evidence around how well their systems work, how they are performing, and what measures they take to improve those systems. That's important to note. This is actually a litmus test, because there's essentially no cost to companies for publishing these. We're not asking for additional reporting to be created. We're just asking for transparency on what already exists. Any company that takes this seriously as a business process would, for example, have a whistleblowing policy and would already measure all of those things. Because if they care, they are measuring these things. If they're not measuring these things, then we have an answer to an extent. Either they don't really care about it at all and haven't invested the time to think about what they should be caring about, or they have thought about it and are actively not doing it. Both of these do not seem like great options. Yes, it is also best practice globally. Many companies already do this. Benefits obviously exist for insiders, because the public can then look at these policies and explain why certain policies fall short, or what looks good and what doesn't, which they can't do today. Again, false confidence may exist there. It's good for the public because we know, and it's good for companies. We talked about the benefits of a speak-up culture, where we should see feedback and improvement of these policies, and prevent all these impacts I mentioned before. In fact, Trillium Asset Management, an asset manager, in 2022, for example, called on Google to improve their whistleblowing systems exactly for all those reasons. They basically said, 'Strong whistleblowers serve shareholders,' and there is an interest from shareholders to have strong whistleblowing systems because we want to make sure there is no misconduct. It basically only doesn't serve direct managers, potentially executives, depending on how you look at it. So it's a very reasonable ask that we're putting forward, which would hopefully still be quite impactful, although of course, just creating transparency is not everything. You can have a great-looking policy and it still doesn't perform. You can create transparency around your evidence, and the evidence looks bad or maybe it's not trustworthy. So this is a minimal thing, and this is the first step that we think organizations, these AI companies, should be taking. We're also happy to work with them, by the way, on these topics. We've got an incredible coalition that we put together for this. It's well over 30 organizations. We're very proud of it, actually, because it's the first coalition of its kind with the best whistleblower support organizations in the world: The Signals Network, Government Accountability Project I mentioned already, Whistleblower Aid, WHISPER in the UK. We have Whistleblowers UK, basically the who's who of the whistleblowing world. Academics as well, people who co-wrote the ISO standard on internal whistleblowing systems are on board. Transparency International, who wrote the best practice guide on internal whistleblowing systems and creating transparency around evidence, is also on board. On the AI side, Stuart Russell is joining the call. Larry Lessig was also a signatory of the Right to Warn. Daniel Kokotajlo is on board. Of course, you are on board, Nathan. Thank you very much for joining the call. And the Future of Life Institute, Karma, many, many more. I'm not going to list them all off the top of my head. Take a look at the website, it's publishyourpolicies.org. If you're an insider at an AI company and you're thinking this sounds like a sensible thing and you would like to have this transparency, reach out internally. Ask your management. Maybe you have an anonymous town hall, maybe you trust your direct managers enough to raise it and say, 'Hey, why are we not doing this? This is standard practice. This could be helpful.' Why not? This is going to benefit your manager as well, and it's going to benefit the manager's manager probably as well. By the way, this is another thing from our survey that I didn't mention. 100% of insiders we surveyed support the publication of policies. So there seems to be pretty broad support for this. If you're an outsider, you're not working in an AI company, spread the word. Make sure the call is heard. That's about it on that campaign.

Nathan Labenz: Great.

Karl Koch: Yeah.

Nathan Labenz: I guess it's too early to have any responses from official channels from companies, right?

Karl Koch: That's right. We just launched last week, so it's been a bit over a week. We know they are aware of this call. They have been aware of this call for a while because, for example, the Future of Life Institute and the AI Safety Index also called for the publication of policies, or rather, they recommended it instead of actively calling for it. The questionnaire underlying that study came out, I think, six weeks ago. It was shared with the companies and included the question,

Nathan Labenz: Yeah.

Karl Koch: Hm.

Nathan Labenz: Cool.

Karl Koch: And I think you also need to talk a little bit about.

Nathan Labenz: Yeah, I was just going to invite you to do that again. So...

Karl Koch: Absolutely.

Nathan Labenz: Yeah.

Karl Koch: Thank you very much.

Nathan Labenz: The floor is yours. What should people expect? What can they count on?

Karl Koch: Yeah.

Karl Koch: So basically, the way we see ourselves in the direct support side, at least, is as a connecting point between the EAI and the whistleblowing ecosystem, and a first point of contact. That's why the third opinion offering I mentioned before allows you to reach out with a question about your concern without sharing any confidential information, fully anonymously. We then workshop together via an open source anonymous tool you can access via Tor browser. We workshop the question together, identify the relevant independent experts together, so you don't have to rely on us knowing the experts in your field better than you do as an insider. We identify relevant experts together, approach them with your question, and bring back their answers to you. Hopefully at this point your concerns are alleviated. If they are not, we will help connect you to pro bono legal counsel who are extremely experienced in helping whistleblowers along their journey, with no pressure for any disclosure. Regardless of where you are in your journey, support is available. You can also reach out directly to us without going through the third opinion process and ask for help on who may be the best fit for you. We will also help you there and supplement the asset from the expert network, independent expertise covered under legal privilege, with those great organizations as required. We also have on our website, apart from these organizations listed, an explanation of the process, a digital privacy guide if you are concerned about digital privacy, which we do actually see quite a lot, and it makes a lot of sense to make sure you stay safe. There are also a bunch of resources there that you can find. We have previously supplied hardened devices to at-risk individuals with specific operating system setups that are highly secure. That is also something we offer on the direct support side. And then on a wider scale, what we as AWI do, we mentioned systematically breaking down barriers for AI insiders. There is the advocacy side, something like this. There is the research side, the survey we mentioned. Of course, that is still ongoing. You can find the link to the survey in the show notes if you work in an AI company. And there is an upcoming legal study that we are currently fundraising for to really dive deep into the status quo of whistleblower protections across a pretty wide range of AI risk scenarios, identifying the most interesting ones as part of that research study. That is upcoming, pending funding. And then there is advocacy, like we have published complaints management, sorry, and policy work, so providing feedback on policy both in the US. There are other great organizations working on this. If you are interested, reach out; we can connect you. And on the EU side with the AI Office, making sure they establish a whistleblower mailbox. In fact, the authors and the vice chairs of the Code of Practice recently called for exactly that as well, which is amazing that they did that. So that is what we focus on at the moment.

Nathan Labenz: Cool. Well, thank you. This has been great. I think...

Karl Koch: Thank you.

Nathan Labenz: Between the coalition of organizations that you've been able to put together and the evident seriousness with which you're taking every aspect of this, the thoughtfulness of the support structures that you've designed, up to and including the provision of hardened devices, all of that is, in my mind, not too much for people that are concerned with just how crazy things might get, to invest in now, in anticipation of the possibility that there might just be a couple dozen individuals who happen to be placed at the right intersection of information and access to what's going on, that have the awareness and consciousness to want to do something about it or to at least seriously question it before moving ahead. I think those people are going to be scarce and precious resources for society, and also under a lot of stress and pressure individually as they're facing those things. So I think it is excellent that you and the coalition of the willing are setting things up now to support those people. And I've been glad to be a small part of it. Hopefully this helps raise awareness further and establishes you guys as a resource that people hopefully will never need. But it seems likely that there are going to be some cases where people are going to need to reach out and get this kind of support. So I, for one, would have appreciated having it two and a half years ago already. But certainly as the stakes only continue to rise, I'm very glad that people in the future will have this option to avail themselves of this kind of.

Nathan Labenz: very thoughtfully designed and soberly provided support. So that's great. Keep up the good work. Again, we're all counting on you. But for now, Karl Koch, Managing Director of the AI Whistleblower Initiative. Thank you for being part of the Cognitive Revolution.

Karl Koch: Thank you very much for having me.