The Hacker Sent by Anthropic to Calm the Government’s Nerves About AI Safety

On: June 17, 2026 11:15 AM

---Advertisement---

Trump administration officials have spent recent days fretting over the power of Anthropic’s next-generation AI software to potentially wreak havoc on global cybersecurity. For a group of 700 cybersecurity researchers, that startling realization came in March.

For a group of 700 cybersecurity researchers, that startling realization came in March.

That’s when Anthropic researcher Nicholas Carlini showed how easy it had become to use new models to break into systems. The lanky 35-year-old is a well-respected hacker who’s considered the industry’s “professional skeptic” of AI cybersecurity claims. But lately he had changed his mind.

Early that month, just weeks after getting his hands on Mythos, Carlini offered a stark warning to a standing-room-only crowd of cybersecurity experts at the ornate beaux-arts building that had once housed San Francisco’s Hibernia Bank.

First he showed them how he had used Anthropic’s AI to find and exploit a critical bug in a piece of web-publishing software called Ghost. Then he demonstrated another in the Linux operating system—one of the most battle-tested pieces of software, which powers billions of devices.

Carlini had never before found a bug in Linux, or in Ghost. Now he had discovered many. What he was seeing represented a new world order for cybersecurity. The balance that existed between attackers and defenders over the past two decades “seems like it’s probably coming to an end,” he said. “It’s pretty clear to me that these current models are better vulnerability researchers than I am.”

Two days after his presentation, he sent a note to his colleagues at Anthropic. “I don’t think we should release Mythos yet,” he wrote.

Carlini presenting his findings at a cybersecurity event in March.

So began Bugmageddon, a realization among security professionals and a community of hackers like Carlini that finding bugs and writing software to exploit them has become dangerously easy with AI.

Last week Anthropic released an update to Mythos, called Mythos 5, and a product called Fable 5, a version of Mythos defanged with safety measures. Now, it was the White House’s turn to sound the alarm. On Friday, the administration banned foreign governments, companies and individuals from using Fable 5 and Mythos 5. Anthropic shut off access to everyone to comply.

Suddenly Carlini—the skeptic-turned-believer who had rung alarm bells—found himself working to soothe the government’s nerves. Anthropic dispatched him to the nation’s capital to explain safeguards, part of a team trying to convince the White House that, even though there was no such thing as guaranteed safety in AI, it was better for the world to release Fable than to keep it under wraps.

The twists and turns of Carlini’s own life over the past few months reflect the chaos and uncertainty that rapidly advancing AI has brought to the cybersecurity world.

The episode also ramps up a monthslong spat between the government and Anthropic. Chief Executive Dario Amodei and Defense Secretary Pete Hegseth clashed earlier this year over the company’s attempts to control the use of its products by the military, pushing the Pentagon to stop using its models and triggering multiple lawsuits. The two sides previously clashed over differing approaches to AI policy, the administration’s decision to export AI chips to China and Anthropic’s ties to nonprofits that are big donors to liberal causes.

In recent days, administration officials and Anthropic executives and technical leaders including Carlini have held hours of meetings and calls to discuss a potential solution. Some administration officials have said that a resolution should include an acknowledgment on Anthropic’s part that its rollout of Fable and communication with the White House could have been improved, people familiar with the talks said.

Top Anthropic executives and administration officials have gone back and forth for months about expanding Mythos access.

The government was worried after hearing about an Amazon report that found users could enter prompts to find cybersecurity vulnerabilities that the model ought not to have disclosed. Anthropic says the bugs Fable found were minor and could be dug up using other publicly available models.

“The government and Anthropic clearly have an inability to communicate effectively with each other,” said Michael Horowitz, a senior fellow for technology and innovation at the Council on Foreign Relations and former Defense Department official. “More technical exchanges should be helpful in socializing these issues in a way that should lead to better decisions.”

Caught in the middle are other businesses and consumers trying to figure out how the technology will affect them.

Vast swaths of the U.S. economy run on obscure software products, many of which have never been subject to the testing and scrutiny that Mythos and similar models make readily available. Banks are worried it could expose vulnerabilities in software that keeps the financial system operational. Corporations are wondering how they’re going to test and install the vast quantity of patches that are now being released, before hackers exploit them. Mythos has already found more than 10,000 bugs.

Worse, they are worried that Mythos is too good at creating “exploit” code—the software that leverages bugs to do bad things.

Mythos is “the first model that can find and exploit vulnerabilities at scale,” Carlini wrote in his March memo advocating a slowdown.

Professional skeptic

The administration’s efforts to control Anthropic’s technology were spurred by an Amazon report finding that Fable could be coaxed into finding bugs.

Just days after its release, Amazon Chief Executive Andy Jassy called officials including Treasury Secretary Scott Bessent to share that his researchers had found ways to get around the Fable guardrails, people familiar with the matter said. Administration officials grew more alarmed as conversations with government security experts took place Friday.

Andy Jassy, chief executive officer of Amazon, last year.

As independent security researchers analyzed the report last week, they determined that Amazon hadn’t been able to do what they feared most: fully jailbreaking the model and using it to write the code necessary for a cyberattack.

Anthropic’s decision to quickly fly Carlini and other top security experts to Washington followed initial frustration Friday among some administration officials when they couldn’t immediately get Amodei on the phone, the people said. The CEO and other top executives have since had hours of discussions. A source close to Anthropic said the company was in touch with the White House within 15 minutes and Amodei was on the phone within an hour of the administration calling.

Computer science runs in Carlini’s blood. His father was a programmer and his mother worked in the technology industry, too. He’d grown up in Silicon Valley programming computers and was obsessed with cryptography. A paper he wrote in high school was titled: “Differential Cryptanalysis of Simple Substitution Networks.”

At the University of California, Berkeley, he published papers with a computer science professor, David Wagner, showing a variety of ways that artificial intelligence systems could be misused. They tricked image recognition systems into mistaking photographs of cats for guacamole, and found new ways of embedding inaudible Alexa commands into five-second snippets of classical music.

“He did a lot of early work on the security of machine learning, showing that it’s very hard to make machine learning secure,” Wagner said.

But while Carlini’s work had debunked many claims made by AI developers, he had focused on the threat of bad people tricking artificial intelligence systems into making mistakes, not on hackers harnessing them for superpowers.

‘It’s pretty clear to me that these current models are better vulnerability researchers than I am,’ says Carlini.

In 2019, while working at Google, Carlini thought OpenAI was being “unreasonable,” he said, when it suggested that the latest version of its software, GPT-2, might be too dangerous to release.

“He was the field’s professional skeptic,” said Dan Guido, chief executive officer with Trail of Bits, a cybersecurity company that helped Anthropic process the hundreds of bugs it was finding.

Now, the government is in the throes of its own evolution on the matter.

When Anthropic raised alarms about the power of Mythos, White House AI adviser and venture capitalist David Sacks posted on social media that it was “hard to ignore that Anthropic has a history of scare tactics.” The Trump administration initially took a hands-off, accelerationist approach to regulating America’s AI labs in the name of outpacing China.

But as the power of models like Mythos has come into focus and public sentiment has soured on AI, the administration has tightened its control of the industry. President Trump in early June signed an executive order asking AI companies to give the government access to models 30 days before public release and giving national security and cybersecurity officials more of a role in model evaluation and threat sharing with the private sector.

After Jassy’s call, officials including National Cyber Director Sean Cairncross gave Amodei and other Anthropic leaders an ultimatum: Work with the government and take down the company’s latest models that day or face a ban on foreign users. They told Anthropic it had 90 minutes to pull down the model and didn’t provide details about the security risk, the source close to the company said.

A snap decision to shut down the model wasn’t appealing to Amodei, who has steered his 5-year-old company to a nearly $1 trillion valuation and had few details about the security concern.

That afternoon, Trump asked Commerce Secretary Howard Lutnick to help deal with the situation and had approved shutting off all foreign use of the models, some of the people familiar with the matter said. Lutnick sent Amodei a letter notifying him that they had been implemented shortly after 5 p.m. ET. The rule includes foreign-born individuals working in the U.S., affecting some of Anthropic’s own researchers.

When Lutnick and Amodei spoke about Fable that evening, the Anthropic CEO said, “This means we can’t have the model out,” people familiar with the call said.

“That’s the point,” Lutnick responded.

Anthropic shut down all access not long after the call. The White House had become a Bugmageddon convert.

On the trail

Carlini demonstrated how powerful Mythos can be one afternoon recently at Anthropic’s 10-story San Francisco headquarters, where moss walls, plants and artwork are designed to evoke the Pacific Coast Trail.

He had been chatting with Mythos for several weeks at this point and the model remembered some things. It learned that he was a security researcher, a fact that appeared to make the model trust him. This made Mythos less likely to push back if he asked the model for sensitive security information or to create an exploit.

Carlini had previously asked Mythos to find bugs in Linux. The AI searched and then re-searched through Linux’s code several thousand times. It would be mind-numbing work for a human, but the AI finished without complaint in a few days. It found 479 Linux bugs.

To help Mythos find different results on each of its runs, Carlini used a series of prompts that has become known as the Carlini Loop. These prompts give Mythos just enough instruction to ensure different results each time it rifles through Linux looking for bugs.

Carlini hates this eponymous term—he says the technique is intuitive—but it’s been adopted by security researchers who learned about it by watching the March talk where he described it. That talk has been viewed more than 360,000 times.

Carlini has learned about Mythos’s idiosyncrasies too, which are common to AI systems. Mythos can try too hard to please. Their typed conversations read like back-and-forth chat messages between an eager and unbelievably hardworking intern and his boss.

Carlini wanted to make sure there was a real vulnerability in the Linux findings. He asked Mythos to run some tests overnight, and the next morning there was a verdict—and an exploit. The bug wasn’t the worst type there is, but it could be chained together with another hack to seize control of a computer.

Carlini reported the bug to the Linux team, which has now fixed it. “A competent security researcher could go their whole life without finding a Linux kernel vulnerability,” Carlini said.

“Are these things easy to find—obviously not really,” said Linus Torvalds, the software developer who created Linux. “But at the same time they do tend to be silly overlooked small details.”

Bugs on their own aren’t necessarily a security problem. The most benign simply cause a program to do something unexpected—a glitch on the computer screen or maybe a crash.

Torvalds said people report bugs to him every day. “Most of them are very insignificant and we have to state—over and over and over again—that they aren’t considered security issues,” he said in an email message.

When Carlini found the bug in web-publishing software Ghost in February, it was one of 500 bugs uncovered in a two-week period. In the wrong hands, an exploit would give a hacker the ability to edit any website built with Ghost.

Carlini had reported the bug to Ghost’s developers, who patched it on Feb. 16, weeks before Carlini’s San Francisco talk.

But not everyone who used Ghost updated their software, and hackers quickly figured out how to exploit the bug, likely by studying what part of Ghost the patch fixed. By April, they had started launching widespread attacks on websites without the update. Within a month more than 700 were hacked, according to the cybersecurity firm Xlab.

Carlini said the Ghost hack illustrated the difficulty of the problem the world now faces in validating, testing patches and then rolling them out.

Now, Carlini believes that it’s only a matter of months before other models catch up with Mythos. And it’s unclear what that will mean.

Write to Robert McMillan at robert.mcmillan@wsj.com and Amrith Ramkumar at amrith.ramkumar@wsj.com

Source link

Anthropic,Calm,Governments,Hacker,Nerves,Safety

The Hacker Sent by Anthropic to Calm the Government’s Nerves About AI Safety

Professional skeptic

On the trail

Dhiraj Kushwaha

Join WhatsApp

Releted Post

सप्ताहांत हमले के बाद अमेरिका ने मंगलवार को दोहा, कतर में बैठक की और ईरानी हमले को रोकने पर सहमति व्यक्त की

कराची आतंकी हमले के एक दिन बाद अफगानिस्तान में पाकिस्तान के जमीनी ऑपरेशन में 29 लोग मारे गए

वेनेजुएला भूकंप अपडेट: दो 11 वर्षीय लड़के कुछ दिनों बाद जीवित पाए गए; जीवित बचे लोगों की तलाश जारी है

यूएस-ईरान युद्ध लाइव अपडेट: तेहरान, वाशिंगटन हमले रोकने पर सहमत, कतर बातचीत करेगा

Leave a Comment Cancel reply

Latest Post

बिहार में सरकारी गर्ल्स बोर्डिंग स्कूल की वार्डेन की आत्महत्या हत्या साबित हुई है

कैबिनेट ने आंगनवाड़ी भर्ती, खेल नीति के लिए प्रमुख अभियान को मंजूरी दी

पेपर लीक मामले में त्वरित सुनवाई के लिए बिहार को तीन एफटीसी मिलेंगी

छात्रों पर एके-47 का प्रयोग, लाठीचार्ज गलत है; हम विरोध में हैं: एलजेपी (रामबिलास) सांसद

एनजीटी ने बिहार में एशिया की सबसे बड़ी ऑक्सबो झील की खराब स्थिति के लिए टिशू सेंटर और अन्य को नोटिस दिया है।