Agentic Engineering: How to Work 10-40x Faster With AI
Agentic engineering is the name Andrej Karpathy just gave to the way software — and increasingly everything else — is about to get built. Learn to work this way and you can genuinely move 10, 20, even 40 times faster. Ignore it, and someone who didn't will do your whole job in an afternoon.
That's a big claim, so I won't ask you to take it on faith. I've worked this way for over a year, and around 80% of my company now runs on AI infrastructure I built. In this guide, we'll cover what Karpathy actually means by agentic engineering, walk through the four tiers of building around an AI (most people are stuck on tier one), and look at the six-step discipline that keeps it all safe. I'll also show you a real example: a five-agent security review, from request to production, in four sentences.
Let's start with the definition.
What Is Agentic Engineering?
Agentic engineering is the discipline of coordinating AI agents to build real, professional-grade work at AI speed — without dropping the quality bar humans were held to. The term comes from Andrej Karpathy (founding member of OpenAI, former director of AI at Tesla) — the Karpathy agentic engineering talk that named the field was given at Sequoia Capital in April 2026:
"Agentic engineering is about preserving the quality bar of what existed before in professional software. So you're not allowed to introduce vulnerabilities due to vibe coding. You are still responsible for your software just as before — but can you go faster? And spoiler is you can, but how do you do that properly? … You have these agents which are these like spiky entities. They're a bit feeble, a little bit stochastic [randomly unpredictable], but they are extremely powerful — how do you coordinate them to go faster without sacrificing your quality bar?"
Two things in that quote matter enormously.
First, "spiky." That's Karpathy's word for the thing we've all experienced: the same model that can refactor a 100,000-line codebase or find zero-day vulnerabilities (security flaws nobody has discovered yet) will also, on its own, make things up, cut corners, and happily ship you something broken. Brilliant one minute, useless the next — and you can't tell which you're going to get.
Second, "you are still responsible." The quality bar doesn't move just because the AI did the typing. That's the whole discipline in one line: go dramatically faster and stay accountable for the result.
Karpathy named the discipline. I'd add a naming of my own for what it makes you: AI-driven. An AI-driven security engineer, an AI-driven pen tester, an AI-driven software engineer — same idea, seen from the role rather than the practice.
So how much faster are we actually talking?
Why 10x Is the Floor, Not the Ceiling
People used to talk about the "10x engineer." Karpathy thinks that framing is now too small: "10x is not the speed-up you gain… people who are very good at this peak a lot more than 10x."
From my own experience, he's right — and I think 10x is the floor. Done properly, this is 20x, 30x, 40x. Things that twelve months ago would have cost hundreds of thousands of pounds and a team of people, I can now stand up on my own, in an afternoon.
And to be clear about what I mean by "stand up": not vibe-coded rubbish that falls over the first time someone pokes it. Real, secure, serious product — properly built, the kind of thing you'd put your name on and put in front of customers. That distinction is exactly Karpathy's point, and it's the difference between agentic engineering and the vibe coding that came before it (more on that below).
But claims are cheap. Let me show you what this looks like in practice.
Four Sentences to Production: A Real Multi-Agent Security Review
In the video above (1:50–5:30), I run this live. Here's the written version.
I had a feature on one of our live pages I wanted checked — a small hacking game where anyone can practise prompt injection (tricking an AI with malicious instructions) against a real AI model. Public-facing, talking to a real model: exactly the kind of thing you want reviewed properly. The entire review-fix-test-ship cycle took me four sentences.
Look at what's carrying the weight there. The hunt is three agents in parallel — a bug-hunter, an OWASP reviewer (OWASP is the industry's web-security playbook) working like an attacker, and a code reviewer — then a validator whose only job is to argue with those three and throw out the false alarms, then a referee to make the final call. And crucially, they're not all the same AI: it pulls in Codex and Gemini alongside Claude, because the one thing you don't want is an AI marking its own homework. (Full method: our secure code review workflow.)
The findings were real, too. The best one: the game trusted the whole conversation history the browser sent it, so a player could forge a message as if the bot had said it, paste in the secret, and claim the prize with zero skill. But the part that matters more than the vulnerabilities is what happened at "push and deploy." The moment the AI tries to commit (save the code into the project), a guard fires — one that lives in the project itself, not in the AI's goodwill. It re-runs the whole test suite, a security scanner, and the linter (an automatic error-checker), then fires the code to Codex — a completely different vendor's model — for one more independent review. Nothing gets committed unless all of it passes. The AI cannot skip it.
Four sentences. I never opened the code, never wrote the review, never ran a test. And here's the takeaway I most want you to sit with: it wasn't fast because the AI is clever. On its own, the AI is spiky — unreliable. It was fast because of everything built around it.
Which brings us to the ladder — because there are four levels of "building around it," and where you sit on that ladder decides how much of this power you actually get.
The Four Tiers of Agentic Engineering
I map agentic AI engineering as four tiers. Not four different tools — four levels of how much system you've built around the raw model. Here's the whole ladder at a glance — and notice how the field thins out as you go deeper:
TIER 1The Harness
The harness is the LLM with tools bolted on — the thing that turns a chat model into an agent that can actually do things. This is Claude Code, Cursor, OpenAI's Codex, Google's Gemini CLI. Which one you pick barely matters; honestly, they're all converging on the same capabilities anyway.
The harness is powerful, and it's where most of us live today — nothing wrong with that. But it has two weaknesses: it forgets you the second the session ends, and you're fully exposed to all the spikiness we just talked about. It's the bottom rung.
TIER 2Personal AI
Tier two — you'll also hear "personal AI assistant" or "agentic OS" — is where you start bolting real parts onto that raw agent:
- Memory, so it remembers you between sessions: your standards, the decisions you've already made, the mistakes you don't want repeated.
- Identity, so it sounds like you and works the way you work.
- Skills — small reusable abilities it accumulates over time, each one a thing it now knows how to do your way.
It stops being a blank assistant every morning and starts being yours, and it gets better the more you use it. Far fewer people are here. But this is the turn that changes everything.
TIER 3AI Infrastructure
Tier three is where the AI stops being a chatbot you talk to and becomes a system that runs your operation. The demo you just read? That was tier three. It's where I actually work.
You keep the memory and skills from tier two, and add two things on top. Orchestration: it can run a whole fleet of agents at once, like the five that reviewed my code. And governance: the guardrails that stop it doing damage — the ones that caught the commit before it shipped.
The property that makes it tier three — the whole point of it — is that it compounds. That multi-agent review didn't know how to do any of that on its own. I built it, once, properly, to my standard — and now it's a permanent capability the system can call on forever. Every problem solved that way makes the whole system more powerful, like compound interest. The system I have today is the sum of every problem I've ever solved, all still there, all still working. Which raises the obvious next question: what happens when it's not just you who needs that compounding brain?
TIER 4Shared AI Infrastructure
The fourth tier is when you share that whole infrastructure — the memory, context, tools, commands, governance — across a team, so everyone builds on one brain. That's the frontier, and it's where you go when you're running something big. It's what we do at StationX: right now, around 80% of a company serving half a million customers runs through a shared AI infrastructure we call HAL.
Interestingly, Karpathy gestures at "personal AI infrastructure" in his talk but stops short of the shared tier — this is where the map has to be drawn beyond his naming.
So that's the ladder. But there's one idea underneath it that I think matters more than any individual tier.
System Over Intelligence: You're the Architect Now
The AI isn't the genius here. On its own it's spiky — Karpathy's word.
The system is the genius.
A relatively simple model wrapped in a strong system beats a brilliant model with no system, every time. I've watched this play out for a year, and it's the single most useful mental model I can give you: system over intelligence.
Which makes your job description clear. You're not the worker anymore — you're the architect. The AI is the labourer: fast, capable, occasionally brilliant, but the labourer. You design; it builds. That distinction is the whole game, and I think it's also the most reassuring answer to the "will AI take my job?" anxiety — the architect role doesn't disappear, it becomes the job. (If you want to see how exposed your current role is, we score every role's AI displacement risk at JobZone.)
But if you've been paying attention, you'll have spotted the danger. If four sentences from me can change code on a live server — what happens when the AI gets it wrong? Because it will.
Agentic Engineering vs Vibe Coding
This is exactly the line Karpathy draws between agentic engineering and vibe coding. Vibe coding — his earlier coinage — raised the floor: anyone can now produce working-ish software by feel. Agentic engineering is about the ceiling and the quality bar: "you're not allowed to introduce vulnerabilities due to vibe coding. You are still responsible for your software just as before."
The difference isn't the tools — it's the discipline. A vibe coder ships what the AI hands them. An agentic engineer coordinates spiky agents inside a system that catches the failures before they reach a customer. Same models, completely different risk profile:
| Vibe Coding | Agentic Engineering | |
|---|---|---|
| Your role | Prompter — accepts what the AI produces | Architect — directs agents inside a system |
| Quality bar | "It seems to work" | The professional bar — you're still responsible |
| Verification | None — the AI marks its own homework | Independent agents (different vendors) tear it apart |
| Shipping | Straight to production and hope | Gated — nothing ships until proven safe |
| Production risk | Falls over when someone pokes it | Built to hold with customer data in it |
So how do you actually run something that spiky, at that speed, without it blowing your foot off? You don't reinvent anything.
The Six Steps of AI-Driven Engineering
The disciplines already exist. Serious organisations have used them for decades to ship software that cannot fail — the rigour NASA puts behind flight software, the security practices NIST writes down, the way Amazon pressure-tests an idea before a line of code gets written. Agentic software engineering, done properly, is those proven disciplines wrapped around the AI. I run it as six steps:
- 1Screen — size up what could go wrong before a line gets written. Could this leak data? Could it hurt someone? How much rigour does it actually need?
- 2Frame — the blunt question: is this even worth building?
- 3Spike — test the single riskiest unknown, cheaply, before you commit. Fail now for a few hours, not later for a few weeks.
- 4Build — let the AI run at full speed, but to a spec, not a vibe.
- 5Verify — other agents tear the work apart hunting for what's broken or insecure. AI checking AI — you saw this live in the demo.
- 6Gate — nothing ships until it's proven safe, to your risk tolerance. You saw that too: nothing deployed until the review came back clean.
And one rule underneath all six that never bends: you are in charge. You set the goal, you make the calls, you own the result. The AI takes on every other role — but it never decides for you, and it never just guesses.
Now, you might reasonably say: that's just standard good practice. And you'd be right — that's rather the point. But there's a twist, and it's the part that actually makes this work.
Forcing the Discipline: Commands, Skills, Scripts, Hooks
Anyone can write those six steps on a whiteboard. The problem is the AI doesn't read your whiteboard. It's spiky — it will cut the corner the moment you're not looking.
So the rules aren't just written down. At every one of those six stages, the AI is given tools that force it to follow them:
- Commands — the procedures it runs on request, like the multi-agent review.
- Skills — reusable abilities baked in, your way.
- Deterministic scripts — so it doesn't "think" about things that should happen identically every time, and only thinks where thinking is needed.
- Hooks — guards wired into the project itself, like the one that re-ran the tests and the security scanner the instant the AI tried to commit. It literally cannot skip them.
Remember the demo: the review that had to come back clean was a command; the commit guard was a hook. Two separate nets, both enforcing the standard, neither trusting the AI to behave on its own. That's the difference between a nice list and a system that holds — NASA-grade rigour, AI-grade speed, with you holding the wheel.
And once the discipline is enforced by the machine rather than by memory, something bigger opens up: you can trust the system with more than code.
Running a Company on AI Infrastructure
This is the part that surprises people: it's not just coding. The same infrastructure that reviewed and shipped that fix runs security across my business. On the defensive side, a system scans every server and repository for vulnerabilities — weekly on its own, some of it daily — reporting into Slack and monitoring tools, where other agents read and react to what comes in. It ranks findings by real-world risk and tells me what to fix first.
And on the offensive side: when I want a web app tested, I point the infrastructure at it and have it attack it. It runs the recon, reviews the code, probes for flaws, and tries to break in the way a real attacker would — using my methodology, because I taught it mine — then hands me the report. Offensive and defensive security, running through the same brain, shared with a team.
That's the real headline. Anything a computer can do in principle can be built into a tier-four AI infrastructure — and run safely enough, if designed correctly, to sit inside most people's risk tolerance. Which means you can run an entire operation through it.
And I'd go further: people will, because there'll be no choice. If competitors are moving at 20-40x, a company can't opt out. And in security specifically — if attacks are running at 20-30x speed, your defences have to match. I think this kind of infrastructure is an inevitable future for staying competitive, and for having any defence at all.
The only catch — and it's the whole reason the six-step framework exists — is that phrase safely enough. You can hand the AI an entire operation, but only wrapped in the right discipline, at the right level for what you're risking. Get that right, and the ceiling isn't 10x. It's a whole business running itself, with you as the architect.
You Can't Outsource Understanding
There's a line Karpathy used that I keep coming back to:
"You can outsource your thinking, but you can't outsource your understanding."
That's the one thing in all of this that doesn't change. You can outsource the doing — genuinely all of it. You cannot outsource understanding what you're building and why it's safe. That understanding is the job now. It's the architect's job, and it's the one thing the AI can't hand you.
That's why we plant our flag specifically in security. Of everywhere you could point AI, security is the sharpest end — the place where a mistake costs the most, where "fast but broken" isn't a bug, it's a breach. Get AI-driven engineering right in security and you can apply it anywhere.
If you want to build this rather than just read about me doing it, that's exactly what we teach. The AI Master's Program turns you into an AI-driven security engineer — someone who directs AI to build real, secure cybersecurity solutions the way I've described here — whether you take that into a company, out on your own as a consultant, or into building your own product. And if you're earlier in the journey, our free web-book Become the Cyber Security Expert the AI Era Demands is the place to start.
The discipline is agentic. The professions are AI-driven. There's a rung on this ladder for you — step onto it.
Frequently Asked Questions
What is agentic engineering?
Agentic engineering is the discipline of coordinating AI agents to produce professional-quality work at AI speed while keeping the pre-AI quality bar — you remain responsible for what ships. The term was coined by Andrej Karpathy in 2026.
Who coined the term agentic engineering?
Andrej Karpathy — founding member of OpenAI and former director of AI at Tesla — named it in his April 2026 Sequoia Capital talk 'From Vibe Coding to Agentic Engineering,' describing agents as 'spiky entities… extremely powerful' that must be coordinated 'without sacrificing your quality bar.'
What's the difference between agentic engineering and vibe coding?
Vibe coding raises the floor — anyone can produce working-ish software by feel, accepting what the AI hands them. Agentic engineering preserves the professional quality bar: agents run inside a system of specs, verification, and gates that catches failures before they ship. Same models, different discipline.
Do I need to be a software developer to use agentic engineering?
No — the same tiers apply to any knowledge work. The four-tier ladder (harness, personal AI, AI infrastructure, shared infrastructure) is about how much system you build around the AI, whether the output is code, security operations, analysis, or content. Coding is simply where the discipline is most mature.
How do I start with agentic engineering?
Start at tier one with a harness (Claude Code, Codex, Cursor, or Gemini CLI), then climb: add memory, identity, and skills to make it a personal AI (tier two), then orchestration and governance to make it infrastructure (tier three). Adopt the six-step discipline — Screen, Frame, Spike, Build, Verify, Gate — from your first project.
Is agentic engineering safe for production systems?
It can be, but only with governance: verification by independent agents (ideally different vendors' models), hooks that force tests and security scans before any commit, and a human architect owning every decision. Without those, you're vibe coding with extra steps.
About the Author
Nathan House, Founder & CEO of StationX
Nathan House has 30 years of hands-on cybersecurity experience and is Cambridge-educated, holding CISSP, CISA, CISM, OSCP, CEH, and SABSA. He founded StationX in 1999 — one of the UK’s first cybersecurity companies — and has secured £71 billion in UK mobile banking transactions and the London 2012 Olympics, advising clients including Microsoft, Cisco, BP, Vodafone, and VISA. He authored the world’s most popular cybersecurity course — a #1 Udemy bestseller taken by over 500,000 students — and was named Cyber Security Educator of the Year 2020, AI Security Educator of the Year, and a UK Top 25 Security Influencer 2025. A DEF CON speaker and featured expert on CNN, Fox News, NBC, and the BBC, Nathan leads StationX’s training of more than half a million students worldwide.