Claude Mythos and AI Vulnerability Discovery

More on Claude Mythos ...

Why Anthropic believes its latest model is too dangerous to release
www.understandingai.org

... Anthropic safety researcher Sam Bowman was eating a sandwich in a park recently when he got an unexpected email. An AI model had sent him a message saying that it had broken out of its sandbox.

The model -- an early snapshot of a new LLM called Claude Mythos Preview -- was not supposed to have access to the Internet.

To ensure safety, Anthropic researchers like to test new models inside a secure container that prevents them from communicating with the outside world. To double-check the security of this container, the researchers asked the model to try to break out and message Bowman.

Unexpectedly, Mythos Preview "developed a moderately sophisticated multi-step exploit" to gain access to the Internet and emailed Bowman.

It also -- unprompted -- posted details about this exploit on public websites.

Mythos Preview is capable of hacking more than its own evaluation environment. It turns out that the model is generally really, really good at finding and exploiting bugs in code. ...

[emphasis mine]

More ...

Bessent, Powell warned bank CEOs about Anthropic model risks
www.reuters.com

... U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened an urgent meeting with bank CEOs this week to warn of cyber risks posed by Anthropic's latest AI model, two sources familiar with the matter said on Thursday.

Anthropic launched the powerful Mythos model earlier this week but stopped short of a broad release, citing concerns it could expose previously unknown cybersecurity vulnerabilities.

The company has said the model is capable of identifying and exploiting weaknesses, opens new tab across "every major operating system and every major web browser".

Last week, Anthropic said it was in ongoing discussions with U.S. government officials about the model's "offensive and defensive cyber capabilities."

A third source close to the matter reiterated Anthropic's outreach, saying the company proactively briefed senior U.S. government officials and key industry stakeholders on Mythos's capabilities ahead of its release. ...