Cost of Penetration Testing: AI vs Humans

June 20269 min readBy Nathan House

If you've priced up a penetration test lately, you know the number stings: a few thousand dollars for a small app, tens of thousands for anything serious, six figures for a real enterprise engagement. So the obvious question in 2026 is the one everyone's quietly asking — can an AI do it cheaper? The cost of penetration testing is colliding with a technology that has, in documented cases, found real vulnerabilities for tens of dollars in compute.

Here's the trap, though. Most of that debate gets stuck on "is the AI as good as a human?" — and that's the wrong question, because neither finds every bug. Once you accept that, the comparison stops being about capability and becomes about economics: cost per bug actually found. So that's what we'll do here — put real numbers on both sides, be honest about what AI can't do, and then follow the logic somewhere uncomfortable: because the same cost collapse that's reshaping your pentest invoice is also arming attackers and tempting governments to lock the whole thing down. Let's get into it.

The Completeness Myth: Neither Humans Nor AI Find Everything

We need to kill one idea before any cost comparison makes sense: the belief that a thorough human review finds all the bugs.

It doesn't, and the proof is famous. Heartbleed — the catastrophic OpenSSL flaw that exposed a huge slice of the internet's encrypted traffic — sat in open-source code, readable by anyone, for more than two years before anyone noticed. Shellshock, a family of bugs in the Bash shell, went undetected for roughly twenty years. These weren't obscure binaries. They were some of the most-reviewed code on Earth, and expert human eyes walked past the flaws for years.

AI isn't complete either — but in a different way. As I covered in my AI cybersecurity threats analysis, AI vulnerability finders are non-deterministic: run the same model over the same code twice and you get different results each time. One pass misses what the next pass catches.

So here's the honest starting point. Neither humans nor AI find all the bugs. Human review is incomplete and slow. AI review is incomplete and probabilistic. The instant you accept that, the interesting question changes — from "can AI find vulnerabilities as well as a human?" to "what does it cost to find a real, validated bug?" That's a question we can actually answer with numbers.

What a Penetration Test Actually Costs

Human penetration testing is priced like the scarce expert labour it is. Pulling together 2026 pricing guides across the industry, the typical ranges look like this:

$5k–30k

Web app pentest

enterprise apps $15k–40k+

$5k–40k

Network pentest

external & internal

$1.2k–3k

Senior tester, per day

fully-loaded rate

$100k+

Enterprise engagement

full-stack / continuous

Source: Industry pentest pricing guides, 2026 (typical ranges, not fixed prices).

Now the part the price list doesn't tell you: a clean report doesn't mean there were no bugs. A two-week engagement at $40,000 might surface two critical issues — or zero. On hardened targets — browsers, OS kernels, big codebases — an expert can spend many days, sometimes weeks, chasing a single serious vulnerability with no guarantee of finding one. Do the arithmetic on those day rates and the effective cost per serious bug (my estimate, not an industry stat) lands anywhere from a few thousand dollars to tens of thousands — and occasionally infinite, when nothing is found at all.

That's the human baseline: expensive, slow, world-class at judgment, and fundamentally bottlenecked by the number of hours in an expert's week.

What AI Costs

Now the other side of the ledger — and the numbers are genuinely startling. In August 2025, DARPA's AI Cyber Challenge put autonomous AI systems against 54 million lines of code. The results are the cleanest public data we have:

$152

Per successful task

DARPA's own estimate

Real zero-days found

6 in C, 12 in Java

45 min

Average per task

find and patch

77%

Of vulnerabilities found

61% patched

Source: DARPA AI Cyber Challenge final, August 2025 (via CyberScoop).

It's not just competitions. Security researcher Sean Heelan used OpenAI's o3 model — "no scaffolding, no agentic frameworks" — to find a real, remote zero-day in the Linux kernel (CVE-2025-37899, a use-after-free in the SMB code). His hardest exploit-development challenge, defeating a stack of modern mitigations, cost roughly $50 for the winning run — about $150 once you count the parallel agents he ran to get there.

Put the same target in front of both, and the gap looks like this:

AI — same target

$ ai-scan ./linux/ksmbd analysing 12,000 LoC… ✓ zero-day found: use-after-free (SMB logoff) time 0:43 cost $0.41

Human — same target

$ expert reviewing ./linux/ksmbd day 9 of 14… still mapping data flow bug not yet found cost so far ≈ $18,000

Hold those two columns side by side. Human: thousands to tens of thousands per serious bug, capped by expert hours. AI: tens to low hundreds of dollars per candidate, running in minutes, around the clock. The cost of looking has collapsed. (The figures above are illustrative of the documented cases discussed, not a single benchmarked run.)

The Honest Catch: Cheap Candidates Aren't Validated Bugs

Before you cancel your pentest contract, the crucial caveat — and it's the one that keeps this article honest.

"Cost per bug" is a dishonest number unless you count validity. AI generates candidate findings cheaply, but a large share are false positives, duplicates, or non-exploitable noise. When an autonomous tool floods a bug-bounty platform with reports, only a fraction are valid — and someone still has to triage, reproduce, and reject the rest. The cost didn't vanish; it moved downstream to verification.

This is why Heelan — who found a real kernel zero-day with AI — still says "the machines aren't about to replace you… they make you significantly more efficient and effective." AI compresses the reading time. It does not remove the human work of scoping the target, judging whether a finding is genuinely exploitable, understanding the business logic, and owning the disclosure. It shifts the expert's hours from searching to prompting, triaging, and verifying.

So the precise claim isn't "AI is cheaper and better." It's: AI has collapsed the cost of finding candidate vulnerabilities to near zero, which moves the bottleneck — and the cost — to verification. If you want the practical how-to for running that verification loop, I've written it up separately: secure code review with AI.

The Same Tool Attacks and Defends

Here's the part that should stop you in your tracks. Everything I just described — AI finding real zero-days at roughly a hundred dollars per successful task — works identically for the attacker.

Diagram showing a single AI vulnerability finder in the centre, with one arrow pointing to an attacker (find and exploit cheaply) and another to a defender (find and patch cheaply) — the cost collapse is symmetric

The AIxCC challenge existed to patch vulnerabilities before attackers exploit them. But the exact same capability, pointed with the opposite intent, is the attack. There is no version of this technology that finds bugs cheaply for defenders but not for criminals. The cost collapse is symmetric. A vulnerability finder doesn't know or care whether the human driving it wants to fix the bug or sell it.

That symmetry quietly demolishes a comfortable position. "We prefer skilled human craftsmanship over AI tooling" sounds principled — until you realise the attackers don't share your principles, and nothing stops them reaching for the same cheap AI you're declining to use.

Defenders Can't Opt Out

So the choice a defender actually faces isn't "human craft vs AI." It's this: adopt AI vulnerability finding, or face AI-equipped attackers with human-speed defence. And human speed loses that race by default.

The speed gap: 491 days average to patch a vulnerability in healthcare versus 45 minutes average for AIxCC systems to find and patch

The two numbers don't measure quite the same thing — but the gap between them tells the story. DARPA noted that patching a vulnerability in healthcare takes an average of 491 days, against 60–90 days in most other industries. That's organisational reality: bureaucracy, testing, downtime windows. Now hold it next to the AIxCC systems that found and patched bugs in a 45-minute average. Whatever the exact like-for-like, an attacker running the same class of tool isn't operating on a 491-day clock — and that mismatch is the whole problem.

DARPA's own challenge director put it bluntly: "Many cybersecurity products, unfortunately, are security theater. We need assertive proof-of-work approaches." The organisations most exposed here are exactly the ones least able to respond — hospitals, credit unions, municipal utilities running legacy code with no security team. For them, cheap AI defence isn't a luxury. It's the only way to stay in the race at all. This is the same structural point I made about AI cyberattacks: defenders have a real advantage — they own the code and the access — but only if they move at machine speed.

The Fork: Who Gets to Hold the Tool

A note on what follows: this last section is my forecast and my opinion — I'll show the reasoning so you can weigh it yourself.

If the same cheap, powerful tool arms both sides, the obvious instinct is: control it. Don't let just anyone have a model that finds zero-days for the cost of a cheap lunch. And governments are already acting on that instinct — in June 2026, the US administration moved to restrict public access to the most capable models, exactly as I described in the AI cybersecurity threats piece. In practice, that turns a tool you were relying on into this:

frontier model — public access

$ run frontier-scan --public authenticating… ✗ ACCESS DENIED restricted: Executive Order, June 2026 the most capable models are no longer public

A decision fork: restrict the tool (big tech, nation-states and criminals keep it, hospitals and small banks lose it) versus release it (defenders armed, but so is every attacker, and open source catches up anyway). Restriction hits the weakest defenders hardest.

But here's the trap in that instinct, and it's a nasty one:

Restrict the tool, and you protect against attackers getting the public version — but you also disarm the defenders who relied on it. The well-resourced (big tech, nation-states, and the criminals who don't obey laws) keep their capability regardless. The hospitals and credit unions lose theirs. Restriction hits the weakest defenders hardest.
Release it freely, and defenders get armed — but so does every attacker, and open-source models catch up to the frontier anyway, so the restriction buys only time.

There's a bitter irony baked in: the same government that moved to restrict the powerful model is the one whose DARPA built AIxCC to get this capability to defenders. The state is simultaneously building the tool and reaching for the lock.

My Take: Adoption Is Forced, the Moat Is Shrinking

So where does this land? Three things I'd stake my name on.

First, adoption is no longer optional. The economics don't politely suggest AI vulnerability finding — they compel it. A defender who refuses is choosing to face AI-speed attackers at human speed, and that is a losing position, full stop.

Second, the human moat is shrinking to one thing: judgment. Finding candidates is now cheap and machine-driven. What stays expensive and human is deciding which findings are real, exploitable, and worth fixing — and owning the consequences. That's verification and judgment, and for now it doesn't scale the way discovery does. But "for now" is doing real work in that sentence. Industrialise verification too, and the moat narrows again.

Third, the policy choice decides who's left defended. If the tool gets locked to the powerful, the gap between organisations that can defend at machine speed and those that can't becomes a chasm — and it runs straight through the hospitals and utilities we can least afford to lose.

The cost of penetration testing was a useful question. The real one is bigger: the cost of finding a vulnerability has collapsed for everyone at once — and what we do about that, as defenders and as a society, is the decision that actually matters. The tool is here. The discovery economics are settled, even if the verification economics aren't. The only open question is who gets to hold it.

Penetration Testing Cost FAQ

Is AI cheaper than human penetration testing?

For generating candidate findings, dramatically — AI works for tens to low hundreds of dollars per run versus thousands to tens of thousands per serious bug for human experts. But cheaper per candidate isn't cheaper per validated bug: a large share of AI findings are false positives that still need human verification. AI collapses the cost of looking, while the cost of confirming shifts to a human.

Will AI replace penetration testers?

Not the skilled ones. AI compresses the search and reading time but doesn't remove scoping, exploitability judgment, business-logic understanding, or disclosure — the parts that require accountability. As researcher Sean Heelan put it after finding a real zero-day with AI: the machines aren't about to replace you, they make you significantly more efficient. Manual, checklist-style testing is far more exposed than expert judgment.

How much does a penetration test cost?

Typical industry ranges: web app pentests $5,000–$30,000, network pentests $5,000–$40,000, senior tester day rates around $1,200–$3,000, and enterprise engagements $100,000+. Prices vary widely with scope and complexity.

Can AI actually find real zero-day vulnerabilities?

Yes — and it already has. In DARPA's 2025 AI Cyber Challenge, autonomous systems found 18 real zero-days across 54 million lines of code. Independently, Sean Heelan used OpenAI's o3 to find a real remote zero-day in the Linux kernel. The capability is proven; the open question is yield and cost at scale, not whether it works.

Should defenders use the same AI tools attackers use?

The economics say they have little choice. The cost collapse in vulnerability finding is symmetric — it arms attackers and defenders equally. A defender who declines to use AI is choosing to meet AI-equipped attackers at human speed, which loses on time. The advantage defenders hold — owning the code and access — only pays off at machine speed.

About the Author

Nathan House, Founder & CEO of StationX

Nathan House has 30 years of hands-on cybersecurity experience and is Cambridge-educated, holding CISSP, CISA, CISM, OSCP, CEH, and SABSA. He founded StationX in 1999 — one of the UK’s first cybersecurity companies — and has secured £71 billion in UK mobile banking transactions and the London 2012 Olympics, advising clients including Microsoft, Cisco, BP, Vodafone, and VISA. He authored the world’s most popular cybersecurity course — a #1 Udemy bestseller taken by over 500,000 students — and was named Cyber Security Educator of the Year 2020, AI Security Educator of the Year, and a UK Top 25 Security Influencer 2025. A DEF CON speaker and featured expert on CNN, Fox News, NBC, and the BBC, Nathan leads StationX’s training of more than half a million students worldwide.