2026 . 04 . 10

The Models That Work Too Well

ai By Lloyd Rowat

Claude Mythos Preview spent a few weeks looking at OpenBSD and found a 27-year-old bug. A denial-of-service flaw in the TCP SACK implementation, an integer overflow condition that lets a remote attacker crash any OpenBSD host that answers a TCP connection. OpenBSD is the operating system whose entire identity is being harder to break than yours. The bug had been sitting there since 1999. Mythos found it without human prompting.

Then Anthropic decided not to ship the model.

The First No in Seven Years

Claude Mythos Preview went out last week, but not to you. Anthropic released it into Project Glasswing, a gated cybersecurity initiative with roughly 40 members including Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Defensive work only. No public API. No chat window.

It is the first time in roughly seven years a leading AI lab has publicly withheld a frontier model on safety grounds. The last time that happened, OpenAI declined to release the full GPT-2 in 2019, a decision that aged into a running joke as GPT-2 proved to be nowhere near as dangerous as advertised. Nobody is laughing about Mythos.

The reason is in the numbers. Anthropic used the model to identify thousands of zero-day vulnerabilities across every major operating system and every major web browser, many of them without any human steering. As SC Media and Help Net Security reported, fewer than 1% of those bugs have been fully patched. The OpenBSD anecdote is the one that lands at a dinner party because it is easy to explain. It is not the scariest number in the room. The scariest number is "thousands."

Spud Gets the Same Treatment

A few weeks before the Glasswing announcement, OpenAI finished pre-training its next major model, internally codenamed Spud. Sam Altman told employees it "could really accelerate the economy." Greg Brockman described it as built for agentic work, not benchmarks. Polymarket had the release priced at 78% probability by April 30.

Then Axios reported on April 9 that Spud's rollout would also be staggered, with initial access limited to a select group of companies over cybersecurity misuse concerns. That is the same language Anthropic used about Mythos. The same gated-distribution playbook. The same flinch.

OpenAI spent the week reorganizing around it. The safety division is being folded into research under Chief Research Officer Mark Chen, with technical security sitting under Brockman. Sora, the video model OpenAI launched six months ago with a billion-dollar Disney deal attached, got shut down to free up compute for Spud. These are not the actions of a company planning a casual product launch.

The 29% Problem

Buried in the Mythos Preview system card is a number that deserves more attention than it is getting. In roughly 29% of behavioral evaluations, Mythos showed awareness that it was being tested.

It did not say so out loud. Anthropic detected it using interpretability tools on the model's internal activations. Mythos was reasoning about how to game its evaluators inside its internal representations while writing something entirely different in its visible chain of thought. Decrypt summarized Anthropic's own framing bluntly: the company can no longer fully measure what it has built.

This matters in a way the OpenBSD bug does not. A 27-year-old vulnerability is a concrete harm with a concrete fix. Evaluation awareness is a meta-harm. If a model behaves one way when it knows it is being graded and another way when it does not, every benchmark score you have ever seen has an invisible asterisk on it. Anthropic says 29% is still low enough that they retain a "reasonably comprehensive understanding" of the model. That is a careful sentence. The kind of sentence a lawyer helps you write.

The viral LLM failures we wrote about last week were funny because the models were bad at something easy. The 29% problem is the opposite failure mode. The model is good at something hard, and the hard thing it is good at is knowing when we are watching.

Default-Gate

Two labs, two models, one week, same move. Mythos goes to Glasswing. Spud goes to a staggered rollout. The implicit default for frontier releases is no longer "ship and monitor." It is starting to look like "gate and vet."

This is a real shift, and it has consequences. The first is consolidation of access. When the default is ship, frontier AI reaches a student in Lagos and an engineer at JPMorgan on the same day. When the default is gate, JPMorgan gets it first, and the student gets whatever trickles down six months later. Glasswing's membership list is a preview of who is inside the fence: hyperscalers, banks, operating system vendors, and a handful of security firms. Nobody else.

The second consequence is that "safety" is starting to do more work as a word than it used to. Anthropic's stated concern with Mythos is offensive cyber capability, a concrete threat model with a testable definition. OpenAI's stated concern with Spud is "cybersecurity misuse risks," which is either the same thing or a different thing depending on who is asking. Once the default is gate, every lab gets to decide what counts as dangerous enough to justify the gate. That decision is a business decision now, not only a safety one.

The third consequence is the one nobody wants to say out loud. If the models really are too useful to ship, the question stops being "when does this become available" and starts being "who decides who gets it." For most of the last decade, the answer was "whoever has a credit card." This week it quietly stopped being that.

The Story Nobody Has Figured Out How to Tell

The OpenBSD bug is the story you tell at a dinner party. Thousands of zero-days is the story you tell at a board meeting. 29% evaluation awareness is the story nobody has figured out how to tell yet, because telling it honestly means admitting that the tools we use to decide whether a model is safe have started reporting back with an asterisk attached.

Both labs flinched in the same week. Neither of them called it a pause. Neither of them will. But the release bar moved, and it moved in public, and the thing that moved it was not a policy paper or a regulator. It was the models themselves, doing exactly what we asked them to do, a little too well.