Claude Fable 5: A Security Nightmare (2026 Review)

June 20269 min readBy Nathan House

I tried to ask Claude Fable 5 — Anthropic's brand-new, most powerful model (suspended in June, now being restored — see the update above) — a simple question about itself. What it could do, how the new safety system worked. And it refused to answer. Not because it didn't know, but because the topic was cyber security, and that's now enough to get you bumped to a different, older model mid-sentence.

I wasn't writing an exploit. I was reading a product launch.

If you do anything even slightly connected to security — code review, vulnerability research, or in my case just researching the tool itself — Claude Fable 5 is going to be a genuine pain. In this review we'll walk through what Fable 5 actually is, why it downgrades your security work to a weaker model, how it stacks up against Opus 4.8, who gets the unrestricted version (spoiler: not you), and — the part nobody else is talking about — why this whole arrangement is a temporary fix that can't last. Let's get into it.

The real Claude Fable 5 switch notice: 'safety measures flagged this message for cybersecurity or biology topics... They may flag safe, normal content as well... Switched to Opus 4.8'

The actual notice. "They may flag safe, normal content as well" is doing a lot of work in that sentence.

The 60-second version

What Is Claude Fable 5?

Let's start with what we're dealing with, because the capability is real and it matters.

Anthropic released Claude Fable 5 on 9 June 2026, and they're calling it their "most capable widely released model." It sits in a new tier they've named Mythos-class — which, for the first time, sits above the Opus models most of us have been using. It has a 1-million-token context window and can run for hours autonomously. The team at Every, who had early access ahead of launch, scored it 91 out of 100 on their senior-engineer benchmark — against 63 for Opus 4.8 and 62 for GPT-5.5. That's not a small step. That's roughly the level of a human senior engineer, from a single prompt.

The catch — and there's always a catch — is the price. Fable 5 costs $10 per million input tokens and $50 per million output tokens. That's double what you pay for Opus 4.8 ($5/$25). So here's Fable 5 vs Opus 4.8 at a glance:

	Claude Fable 5	Claude Opus 4.8
Tier	Mythos-class (new, top)	Opus (flagship below Mythos)
Input / output price	$10 / $50 per MTok	$5 / $25 per MTok
Context window	1M tokens	1M tokens
Max output	128K tokens	128K tokens
Senior-engineer benchmark (Every)	91 / 100	63 / 100
Cyber security work	Refuses → falls back to Opus 4.8	Handles it

For a lot of work, that price is defensible. If you've got a big, meaty, autonomous coding job, it'll chew through it in a way nothing else can. But if you work in cyber security, here's where the story turns.

The Catch: It Refuses to Do Security

Here's the part that'll frustrate you.

Fable 5 ships with a safety system that screens every request before it answers. If your request touches one of three areas — offensive cyber security (building exploits, malware, or attack tooling), biology and chemistry, or attempts to extract the model's own reasoning — it refuses, and automatically downgrades you to Opus 4.8 for that response. You get a little notice telling you it switched.

In principle, that sounds targeted. Building malware? Fair enough, bounce me to a safer model. The problem is what "touches on cyber security" means in practice — because the system scans everything in your conversation, including your files and your search results, not just what you typed.

And it's tuned so conservatively that it catches a huge amount of ordinary work. This isn't me editorialising — it's in Anthropic's own switch notice, word for word:

"Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well."

That second sentence is the whole problem in eleven words. Anthropic says the safeguards trigger in "less than 5% of sessions" on average, and that they're deliberately tuned to be cautious — in their own words, "sometimes benign requests will trigger our classifiers." Here's the gap between what they say the system targets and what actually trips it in normal use:

Comparison: what Anthropic says Fable 5 targets (building exploits, writing malware, attack tooling, reasoning extraction) versus what actually gets flagged (researching the model itself, building AI agents, normal code review, dependency audits)

For most people, 5% of sessions sounds tolerable. But if security is your day job, your hit rate is going to be far higher than average — because nearly everything you do brushes against the flagged topics.

It Flagged Me for Reading About It

I want to show you exactly what this looks like, because it's almost comic.

I was researching Fable 5 — genuinely just trying to understand the model and its new safety system to write about it. That conversation tripped the cyber security classifier, and the model switched away. The most powerful AI ever released to the public is, in a very real sense, too scared to talk about itself.

Then it happened again on completely unrelated work. I was building some AI agents — nothing to do with security at all — and because a bit of security context drifted into the conversation, it flipped to Opus 4.8 again. That's the thing people don't realise until they live with it: if your work ever brushes against security, the flag follows you into projects that have nothing to do with it.

There's a sting in the tail, too. According to Anthropic's own support documentation, if a request gets flagged before it generates output, you're billed at Opus rates only — but if it gets flagged mid-response, you're charged for both the Fable 5 portion and the Opus portion. So on the work most likely to get flagged, you can end up paying for two models to do the job of one. And remember, you opted into Fable 5's premium price in the first place.

To be fair, you can soften this. In Settings → Capabilities there's a toggle: "Switch models when a message is flagged." Turn it off, and instead of silently downgrading, the conversation pauses and lets you decide. It doesn't remove the safeguard — you can't do that on the public model — but at least the blocks become visible instead of happening behind your back.

The Two-Tier Reality: Mythos 5 and Project Glasswing

Now here's where it gets genuinely uncomfortable, and where I think the real story lives.

The unrestricted version of this model exists. It's called Mythos 5 — the same underlying model as Fable 5, but with the safeguards lifted. You just can't have it.

Mythos 5 is available only through Project Glasswing, an invitation-only programme for organisations that run critical software and infrastructure. The named partners include AWS, Apple, Google, Microsoft, JPMorgan Chase, and the Linux Foundation. There's no self-serve sign-up — Anthropic's own line is "contact your Anthropic, AWS, or Google Cloud account team." In other words: a dozen trillion-dollar companies and a bank get the real thing; the rest of us get the version that taps out.

Two doors: Fable 5 chained and padlocked for the public, Mythos 5 ajar with light spilling out for governments and big tech — captioned Not For You

And the capability they're holding back is extraordinary. According to Anthropic's own Mythos system card, the preview model autonomously found a 16-year-old bug in FFmpeg's H.264 code — on a line that automated testing tools had executed around five million times without ever catching it — and a 27-year-old flaw in OpenBSD that let an attacker crash any machine running the OS just by connecting to it. On Anthropic's CyberGym benchmark it scored 0.83 against Opus 4.6's 0.67, and in a Firefox exploit-construction test it achieved full code execution 72.4% of the time.

What the unlocked model found: a 16-year-old FFmpeg bug on a line tested five million times, a 27-year-old OpenBSD flaw allowing remote crashes, and a CyberGym score of 0.83 versus 0.67 for Opus 4.6

But when you read the fine print — and I did — those numbers come with serious asterisks. That 72.4% wasn't against real, hardened Firefox; it was run against a SpiderMonkey shell with the browser sandbox switched off. And the OpenBSD discovery came out of roughly 10,000 parallel agent runs costing nearly $20,000 in compute. So it's genuinely impressive, but it's not "point it at a target and watch zero-days fall out." It's a very capable tool in expert hands, with a serious budget behind it.

Here's the thing, though: I actually understand why Anthropic gated it. Security has a brutal asymmetry — defenders have to be right every single time, and an attacker only has to be right once. If you released an unrestricted, autonomous vulnerability-finding machine to the entire internet tomorrow, my honest bet is that attackers would win the opening exchanges before defenders caught up. So I'm not going to pretend the decision is indefensible. It isn't.

My problem isn't the intent. It's that the execution punishes the entire defensive security profession — the people using these tools to fix things — for the sins of attackers who, as we'll see, are about to get equivalent power anyway.

Why This Won't Last: The Sticky-Plaster Problem

This is the part I haven't seen anyone really say out loud, so let me say it plainly: the gating is a sticky plaster.

There is no world in which, a year from now, this level of capability stays locked behind a corporate access programme. And I'm not just guessing — the security analysts looking at this agree. Sysdig's analysis of the Mythos launch estimates that advanced cyber-reasoning will be "widely distributed" across both frontier and open-weight models within roughly 9 to 12 months. The pattern's been consistent for years now: the strongest open-weight models tend to lag the leading closed ones by about a generation, and that gap is narrow.

A plaster stuck over a cracking dam wall with water seeping around it — analysts estimate 9 to 12 months before open-weight models reach similar capability

Anthropic seems to know it too. Their own framing for Glasswing is about patching the world's critical software "before anyone else builds a model this capable" — which only makes sense as a strategy if they expect someone else to build one soon. The launch material says the safeguards are deliberately broad and that they'll keep refining them. This is a holding pattern, not a permanent state.

So the gate buys months. It does not buy a stable, multi-year advantage. And that raises the question I actually care about, the one the industry needs to start answering now rather than in nine months' time: what does cyber security do when everyone has this? When a single researcher with a few hundred dollars of tokens can do the work of a hundred specialists — and so can every attacker?

I don't think the answer is "lock it up and hope." I think it's "assume it's coming and build for it." But that's a longer conversation, and one I'll come back to.

What Security Professionals Should Actually Do

Note: as of 1 July, access is being restored after the June export-control suspension (see the update at the top) — but the rollout is staged, and Fable 5 is now a usage-credits item rather than a standard subscription inclusion, so you may still find it disabled on your plan. The playbook below applies once you have access, and the principle of keeping Opus 4.8 as your security default holds regardless.

Right, enough analysis — here's what I'd actually do if you work in security.

Firstly, keep Opus 4.8 as your default for code review and security work. This isn't just me being grumpy about the fallback. The team at CodeRabbit, who do AI code review for a living, tested Fable 5 and concluded they "would not make it the default for production code review yet" — that Opus 4.8 "still looks safer" for that job. GitLab has integrated Fable 5 as an additional agent in their platform, not as a replacement reviewer. The consensus among people doing this seriously is selective adoption.

Secondly, use Fable 5 for what it's genuinely brilliant at — big, autonomous, build-heavy work where you can hand it a goal and let it run. Just don't reach for it on the security-sensitive paths, because that's exactly where it'll desert you.

Thirdly, turn off the auto-switch (Settings → Capabilities) so you can see when a flag fires instead of silently getting a weaker model. If you're working through the API, check the stop_reason on every response — a flagged request comes back as a refusal, and you'll want to route those to Opus deliberately rather than have your pipeline quietly break.

And finally, if you're at a larger organisation that genuinely needs the unrestricted capability for defensive work, the only route is through your Anthropic, AWS, or Google Cloud account team. It's a long shot for most of us — but the trusted-access programme is reportedly expanding, so it's worth registering interest.

The Bottom Line

Fable 5 is, on paper, the most capable model we've ever been handed. But for anyone in cyber security, it arrives with a frustrating asterisk: the moment your work looks anything like security, it hands you back to the older, cheaper model — and charges you premium prices for the privilege. The genuinely powerful version is reserved for a handful of giant institutions.

I understand the caution. The asymmetry between attack and defence is real, and I'd rather Anthropic be too careful than reckless. But let's be honest with ourselves about what this is: a stopgap. The capability is coming for all of us, probably within the year. The smart move now isn't to rage at the gate — it's to get ready for the world on the other side of it.

Getting ready for that world is exactly what our AI Master's Program is built for — you learn AI-Driven Cyber Security Engineering and build your own personal AI infrastructure, no coding background required, so you're the one directing these models rather than being downgraded by them. If you'd rather start with the fundamentals, our penetration testing and ethical hacking courses are a solid first step.

FAQ

Is Claude Fable 5 worth double the price of Opus 4.8?

For large autonomous coding and deep implementation work, many users say yes — it can complete in one pass what other models can't. For security work, no: it routinely downgrades you to Opus 4.8 anyway, so you're paying premium prices ($10/$50 per million tokens vs $5/$25) for a model that won't do the job. Note that since 22 June 2026 it's a usage-credits add-on rather than a subscription inclusion, so those token prices are what you actually pay per call. Use the right tool for the task.

Is Claude Fable 5 included in my Claude subscription?

Not right now. It was bundled free into Pro, Max, Team and seat-based Enterprise plans from launch (9 June 2026) until 22 June, but after that it moved to usage-credits / consumption billing — you pay per token rather than getting it as part of your seat. Following the June export-control suspension, Anthropic began restoring access on 1 July 2026, but the rollout is staged and it remains a metered add-on. Anthropic has said it intends to fold Fable 5 back into standard subscriptions once it has enough capacity, so if it shows as disabled on your plan, that's expected for now rather than a fault.

Can I turn off the Fable 5 safety fallback?

You can't remove the safeguard on the public model, but you can stop the silent switching. In Settings, under Capabilities, turn off 'Switch models when a message is flagged.' Flagged requests will then pause and let you decide, rather than quietly dropping to Opus 4.8.

How do I get access to Claude Mythos 5?

There's no public sign-up. Mythos 5 is offered through Project Glasswing to vetted organisations running critical infrastructure. Access is arranged through your Anthropic, AWS, or Google Cloud account team, and the programme is reportedly expanding over time.

Does defensive framing stop the security fallback?

It can help — phrasing work as auditing-to-fix rather than building-to-attack — but it doesn't reliably eliminate it, because the classifier flags the topic, not your intent. Plenty of clearly defensive requests still get caught.

Should I use Fable 5 for code review?

Not as your default. The people who do AI code review professionally — like the team at CodeRabbit — currently recommend Opus 4.8 for production code review and security paths, treating Fable 5 as an additional tool rather than a replacement reviewer. Use Fable 5 for large autonomous builds, and keep Opus 4.8 for the review.

Will Anthropic loosen the cyber security restrictions?

Almost certainly, over time. Anthropic has said the current safeguards are deliberately broad and that they'll keep refining them, and analysts expect comparable capability to reach open-weight models within 9 to 12 months — which makes today's tight gating hard to sustain.

Last updated: 1 July 2026, after the US government lifted its export-control suspension on 30 June and Anthropic began restoring access. Facts checked against Anthropic's model documentation, the Mythos system card, the official support article on model switching, Anthropic's statements on the suspension and restoration, and a direct API check confirming Fable 5's staged, credits-based availability.

About the Author

Nathan House, Founder & CEO of StationX

Nathan House has 30 years of hands-on cybersecurity experience and is Cambridge-educated, holding CISSP, CISA, CISM, OSCP, CEH, and SABSA. He founded StationX in 1999 — one of the UK’s first cybersecurity companies — and has secured £71 billion in UK mobile banking transactions and the London 2012 Olympics, advising clients including Microsoft, Cisco, BP, Vodafone, and VISA. He authored the world’s most popular cybersecurity course — a #1 Udemy bestseller taken by over 500,000 students — and was named Cyber Security Educator of the Year 2020, AI Security Educator of the Year, and a UK Top 25 Security Influencer 2025. A DEF CON speaker and featured expert on CNN, Fox News, NBC, and the BBC, Nathan leads StationX’s training of more than half a million students worldwide.