Claude Fable 5: A Security Nightmare (2026 Review)
I tried to ask Claude Fable 5 — Anthropic's brand-new, most powerful model — a simple question about itself. What it could do, how the new safety system worked. And it refused to answer. Not because it didn't know, but because the topic was cyber security, and that's now enough to get you bumped to a different, older model mid-sentence.
I wasn't writing an exploit. I was reading a product launch.
If you do anything even slightly connected to security — code review, vulnerability research, or in my case just researching the tool itself — Claude Fable 5 is going to be a genuine pain. In this review we'll walk through what Fable 5 actually is, why it downgrades your security work to a weaker model, how it stacks up against Opus 4.8, who gets the unrestricted version (spoiler: not you), and — the part nobody else is talking about — why this whole arrangement is a temporary fix that can't last. Let's get into it.
The actual notice. "They may flag safe, normal content as well" is doing a lot of work in that sentence.
The 60-second version
What Is Claude Fable 5?
Let's start with what we're dealing with, because the capability is real and it matters.
Anthropic released Claude Fable 5 on 9 June 2026, and they're calling it their "most capable widely released model." It sits in a new tier they've named Mythos-class — which, for the first time, sits above the Opus models most of us have been using. It has a 1-million-token context window and can run for hours autonomously. The team at Every, who had early access ahead of launch, scored it 91 out of 100 on their senior-engineer benchmark — against 63 for Opus 4.8 and 62 for GPT-5.5. That's not a small step. That's roughly the level of a human senior engineer, from a single prompt.
The catch — and there's always a catch — is the price. Fable 5 costs $10 per million input tokens and $50 per million output tokens. That's double what you pay for Opus 4.8 ($5/$25). So here's Fable 5 vs Opus 4.8 at a glance:
| Claude Fable 5 | Claude Opus 4.8 | |
|---|---|---|
| Tier | Mythos-class (new, top) | Opus (flagship below Mythos) |
| Input / output price | $10 / $50 per MTok | $5 / $25 per MTok |
| Context window | 1M tokens | 1M tokens |
| Max output | 128K tokens | 128K tokens |
| Senior-engineer benchmark (Every) | 91 / 100 | 63 / 100 |
| Cyber security work | Refuses → falls back to Opus 4.8 | Handles it |
For a lot of work, that price is defensible. If you've got a big, meaty, autonomous coding job, it'll chew through it in a way nothing else can. But if you work in cyber security, here's where the story turns.
The Catch: It Refuses to Do Security
Here's the part that'll frustrate you.
Fable 5 ships with a safety system that screens every request before it answers. If your request touches one of three areas — offensive cyber security (building exploits, malware, or attack tooling), biology and chemistry, or attempts to extract the model's own reasoning — it refuses, and automatically downgrades you to Opus 4.8 for that response. You get a little notice telling you it switched.
In principle, that sounds targeted. Building malware? Fair enough, bounce me to a safer model. The problem is what "touches on cyber security" means in practice — because the system scans everything in your conversation, including your files and your search results, not just what you typed.
And it's tuned so conservatively that it catches a huge amount of ordinary work. This isn't me editorialising — it's in Anthropic's own switch notice, word for word:
"Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well."
That second sentence is the whole problem in eleven words. Anthropic says the safeguards trigger in "less than 5% of sessions" on average, and that they're deliberately tuned to be cautious — in their own words, "sometimes benign requests will trigger our classifiers." Here's the gap between what they say the system targets and what actually trips it in normal use:
For most people, 5% of sessions sounds tolerable. But if security is your day job, your hit rate is going to be far higher than average — because nearly everything you do brushes against the flagged topics.
It Flagged Me for Reading About It
I want to show you exactly what this looks like, because it's almost comic.
I was researching Fable 5 — genuinely just trying to understand the model and its new safety system to write about it. That conversation tripped the cyber security classifier, and the model switched away. The most powerful AI ever released to the public is, in a very real sense, too scared to talk about itself.
Then it happened again on completely unrelated work. I was building some AI agents — nothing to do with security at all — and because a bit of security context drifted into the conversation, it flipped to Opus 4.8 again. That's the thing people don't realise until they live with it: if your work ever brushes against security, the flag follows you into projects that have nothing to do with it.
There's a sting in the tail, too. According to Anthropic's own support documentation, if a request gets flagged before it generates output, you're billed at Opus rates only — but if it gets flagged mid-response, you're charged for both the Fable 5 portion and the Opus portion. So on the work most likely to get flagged, you can end up paying for two models to do the job of one. And remember, you opted into Fable 5's premium price in the first place.
To be fair, you can soften this. In Settings → Capabilities there's a toggle: "Switch models when a message is flagged." Turn it off, and instead of silently downgrading, the conversation pauses and lets you decide. It doesn't remove the safeguard — you can't do that on the public model — but at least the blocks become visible instead of happening behind your back.
The Two-Tier Reality: Mythos 5 and Project Glasswing
Now here's where it gets genuinely uncomfortable, and where I think the real story lives.
The unrestricted version of this model exists. It's called Mythos 5 — the same underlying model as Fable 5, but with the safeguards lifted. You just can't have it.
Mythos 5 is available only through Project Glasswing, an invitation-only programme for organisations that run critical software and infrastructure. The named partners include AWS, Apple, Google, Microsoft, JPMorgan Chase, and the Linux Foundation. There's no self-serve sign-up — Anthropic's own line is "contact your Anthropic, AWS, or Google Cloud account team." In other words: a dozen trillion-dollar companies and a bank get the real thing; the rest of us get the version that taps out.
And the capability they're holding back is extraordinary. According to Anthropic's own Mythos system card, the preview model autonomously found a 16-year-old bug in FFmpeg's H.264 code — on a line that automated testing tools had executed around five million times without ever catching it — and a 27-year-old flaw in OpenBSD that let an attacker crash any machine running the OS just by connecting to it. On Anthropic's CyberGym benchmark it scored 0.83 against Opus 4.6's 0.67, and in a Firefox exploit-construction test it achieved full code execution 72.4% of the time.
But when you read the fine print — and I did — those numbers come with serious asterisks. That 72.4% wasn't against real, hardened Firefox; it was run against a SpiderMonkey shell with the browser sandbox switched off. And the OpenBSD discovery came out of roughly 10,000 parallel agent runs costing nearly $20,000 in compute. So it's genuinely impressive, but it's not "point it at a target and watch zero-days fall out." It's a very capable tool in expert hands, with a serious budget behind it.
Here's the thing, though: I actually understand why Anthropic gated it. Security has a brutal asymmetry — defenders have to be right every single time, and an attacker only has to be right once. If you released an unrestricted, autonomous vulnerability-finding machine to the entire internet tomorrow, my honest bet is that attackers would win the opening exchanges before defenders caught up. So I'm not going to pretend the decision is indefensible. It isn't.
My problem isn't the intent. It's that the execution punishes the entire defensive security profession — the people using these tools to fix things — for the sins of attackers who, as we'll see, are about to get equivalent power anyway.
Why This Won't Last: The Sticky-Plaster Problem
This is the part I haven't seen anyone really say out loud, so let me say it plainly: the gating is a sticky plaster.
There is no world in which, a year from now, this level of capability stays locked behind a corporate access programme. And I'm not just guessing — the security analysts looking at this agree. Sysdig's analysis of the Mythos launch estimates that advanced cyber-reasoning will be "widely distributed" across both frontier and open-weight models within roughly 9 to 12 months. The pattern's been consistent for years now: the strongest open-weight models tend to lag the leading closed ones by about a generation, and that gap is narrow.
Anthropic seems to know it too. Their own framing for Glasswing is about patching the world's critical software "before anyone else builds a model this capable" — which only makes sense as a strategy if they expect someone else to build one soon. The launch material says the safeguards are deliberately broad and that they'll keep refining them. This is a holding pattern, not a permanent state.
So the gate buys months. It does not buy a stable, multi-year advantage. And that raises the question I actually care about, the one the industry needs to start answering now rather than in nine months' time: what does cyber security do when everyone has this? When a single researcher with a few hundred dollars of tokens can do the work of a hundred specialists — and so can every attacker?
I don't think the answer is "lock it up and hope." I think it's "assume it's coming and build for it." But that's a longer conversation, and one I'll come back to.
What Security Professionals Should Actually Do
Right, enough analysis — here's what I'd actually do today if you work in security.
Firstly, keep Opus 4.8 as your default for code review and security work. This isn't just me being grumpy about the fallback. The team at CodeRabbit, who do AI code review for a living, tested Fable 5 and concluded they "would not make it the default for production code review yet" — that Opus 4.8 "still looks safer" for that job. GitLab has integrated Fable 5 as an additional agent in their platform, not as a replacement reviewer. The consensus among people doing this seriously is selective adoption.
Secondly, use Fable 5 for what it's genuinely brilliant at — big, autonomous, build-heavy work where you can hand it a goal and let it run. Just don't reach for it on the security-sensitive paths, because that's exactly where it'll desert you.
Thirdly, turn off the auto-switch (Settings → Capabilities) so you can see when a flag fires instead of silently getting a weaker model. If you're working through the API, check the stop_reason on every response — a flagged request comes back as a refusal, and you'll want to route those to Opus deliberately rather than have your pipeline quietly break.
And finally, if you're at a larger organisation that genuinely needs the unrestricted capability for defensive work, the only route is through your Anthropic, AWS, or Google Cloud account team. It's a long shot for most of us — but the trusted-access programme is reportedly expanding, so it's worth registering interest.
The Bottom Line
Fable 5 is, on paper, the most capable model we've ever been handed. But for anyone in cyber security, it arrives with a frustrating asterisk: the moment your work looks anything like security, it hands you back to the older, cheaper model — and charges you premium prices for the privilege. The genuinely powerful version is reserved for a handful of giant institutions.
I understand the caution. The asymmetry between attack and defence is real, and I'd rather Anthropic be too careful than reckless. But let's be honest with ourselves about what this is: a stopgap. The capability is coming for all of us, probably within the year. The smart move now isn't to rage at the gate — it's to get ready for the world on the other side of it.
Getting ready for that world is exactly what our AI Master's Program is built for — you learn AI-Driven Cyber Security Engineering and build your own personal AI infrastructure, no coding background required, so you're the one directing these models rather than being downgraded by them. If you'd rather start with the fundamentals, our penetration testing and ethical hacking courses are a solid first step.
FAQ
Is Claude Fable 5 worth double the price of Opus 4.8?
For large autonomous coding and deep implementation work, many users say yes — it can complete in one pass what other models can't. For security work, no: it routinely downgrades you to Opus 4.8 anyway, so you're paying premium prices ($10/$50 per million tokens vs $5/$25) for a model that won't do the job. Use the right tool for the task.
Can I turn off the Fable 5 safety fallback?
You can't remove the safeguard on the public model, but you can stop the silent switching. In Settings, under Capabilities, turn off 'Switch models when a message is flagged.' Flagged requests will then pause and let you decide, rather than quietly dropping to Opus 4.8.
How do I get access to Claude Mythos 5?
There's no public sign-up. Mythos 5 is offered through Project Glasswing to vetted organisations running critical infrastructure. Access is arranged through your Anthropic, AWS, or Google Cloud account team, and the programme is reportedly expanding over time.
Does defensive framing stop the security fallback?
It can help — phrasing work as auditing-to-fix rather than building-to-attack — but it doesn't reliably eliminate it, because the classifier flags the topic, not your intent. Plenty of clearly defensive requests still get caught.
Should I use Fable 5 for code review?
Not as your default. The people who do AI code review professionally — like the team at CodeRabbit — currently recommend Opus 4.8 for production code review and security paths, treating Fable 5 as an additional tool rather than a replacement reviewer. Use Fable 5 for large autonomous builds, and keep Opus 4.8 for the review.
Will Anthropic loosen the cyber security restrictions?
Almost certainly, over time. Anthropic has said the current safeguards are deliberately broad and that they'll keep refining them, and analysts expect comparable capability to reach open-weight models within 9 to 12 months — which makes today's tight gating hard to sustain.
Last updated: 10 June 2026. Facts checked against Anthropic's model documentation, the Mythos system card, and the official support article on model switching.
About the Author
Nathan House, StationX
Nathan House is a cybersecurity expert with 30 years of hands-on experience. He holds OSCP, CISSP, and CEH certifications, has secured £71 billion in UK mobile banking transactions, and has worked with clients including Microsoft, Cisco, BP, Vodafone, and VISA. Named Cyber Security Educator of the Year 2020 and a UK Top 25 Security Influencer 2025, Nathan is a featured expert on CNN, Fox News, and NBC. He founded StationX, which has trained over 500,000 students in cybersecurity.