AI Social Engineering: The Con Man That Never Sleeps

12 min readBy Nathan House

Imagine a con man who never sleeps, never gets nervous, and never fumbles his story. He can phone a thousand of your employees at once, in a voice each of them trusts, and have a warm, patient, personalised conversation with every single one — at a cost of a few cents per call. He learns from every rejection and gets better by the hour. Every piece of that con man already exists and has been demonstrated in public. Targeted versions are being used right now — one of them cost a single company $25 million. What's arriving next is the cheap, reliable, mass-produced version, and building it no longer takes a criminal genius. It takes a credit card, some off-the-shelf AI, and an afternoon.

This is what AI social engineering really means, and it's why the human side of security just became the dangerous side. In this article we're going to build this con man together — conceptually, as a threat model, not a toolkit — so you can see exactly how the pieces fit. Then, because understanding the attack is the whole point, we'll break down how to stop him. Let's get into it.

TL;DR — if you've only got 30 seconds

AI didn't invent new manipulation — it removed the one thing that limited it: social engineering now scales. Vishing is already the #2 way attackers break into networks.

An "AI con man" is three parts — a Brain (the AI running the conversation), a Mask (cloned voice, face or text), and a Memory (your scraped personal data). It's built by renting commercial tools, not writing malware.

It's already real: a deepfake video call cost the firm Arup $25.6M; DEF CON teams built autonomous vishing bots for "a credit card and time."

You don't beat it by spotting fakes. You beat it by changing your process — verify sensitive requests out-of-band, agree safe words, and make sure no single conversation can move money or grant access.

AI Social Engineering: The Con Man Who Never Sleeps

For as long as social engineering has existed, it's had one saving grace for the defender: it doesn't scale. A brilliant human con artist can only work one mark at a time. He gets tired. He has good days and bad days. There's only one of him. So even though the technique was devastating, the throughput was tiny — which is why most of us have only ever met social engineering as a clumsy phishing email, not a smooth-talking professional.

That constraint has just been removed. And that's the entire story of this article, so it's worth sitting with for a second.

AI didn't invent any new manipulation techniques — the psychology is the same stuff intelligence agencies have taught for decades. What AI changed is the economics. Google's Mandiant, in its M-Trends 2026 report, found that across its own investigations, voice phishing (vishing) surged to 11% of intrusions — the second most common way attackers first got into a network — while old-fashioned email phishing fell to just 6%. That's not a small shift in tactics. That's the threat landscape reorganising itself around a new reality: manipulation at machine scale.

So the question isn't "could someone build an AI con man?" They already have. The useful question is: what's he actually made of? Because once you can see the parts, you can see where to break them.

Meet Your Attacker: The Anatomy of an AI Con Man

Whenever I try to help someone defend against a threat, I find it far more useful to build the attacker in my head than to memorise a list of scary scenarios. So let's assemble ours. Our AI con man has three parts, and I want you to remember them, because the defence maps directly onto them later.

The Brain — the large language model that runs the conversation, decides what to say, and adapts in real time.

The Mask — the synthetic voice, face, or writing style that makes him look and sound like someone you trust.

The Memory — the harvested personal information that makes every approach feel personal and legitimate.

The anatomy of an AI con man in three parts: The Brain (the AI that runs the conversation), The Mask (cloned voice, face and writing), and The Memory (your harvested personal data)

Brain, Mask, Memory. A mind to manipulate you, a disguise to earn your trust, and a dossier to make it all believable. Strip any one of those away and the con collapses — which is exactly the seam we'll pull on at the end. First, let's look at what makes the Brain so much more dangerous than the human con artist it replaces.

The Brain: Elicitation at Machine Scale

Here's the thing most people miss about social engineering. The best information theft doesn't come from asking questions. It comes from a technique intelligence officers call elicitation — steering a friendly conversation so the target volunteers secrets without ever realising they were asked.

Ex-CIA officers describe a whole toolkit of these moves, and here's the uncomfortable part: every single one is just a conversational script — which is precisely the thing a large language model does better than anything else on earth. Let's go through the full set, because once you can name them, you can feel them being used on you:

The deliberate false statement. You say something you know is wrong — "Oh, you're still on the old VPN, right?" — and the target's instinct to correct you hands you the real answer: "No, we moved to the new one last year." They defended the truth without ever meaning to reveal it. This is the one that surprises people most, because being helpful by correcting an error doesn't feel like a security event.

Bracketing. You throw out a range — "Your team's what, fifty, sixty people?" — and they narrow it for you: "Nah, more like thirty." You never asked for a number; they volunteered a precise one.

Flattery. You praise someone's expertise, and they prove you right by over-explaining. "You clearly run that whole network single-handed — how do you keep the VPN so solid?" Ego does the rest.

Feigned ignorance. The mirror image of flattery. You play dumb, and the target teaches you — walking you through the exact process, access steps, or system detail you were fishing for, because correcting a confused person feels like kindness.

Quid pro quo. You offer a small confidence first — a bit of harmless "inside" information — which triggers the deep human reflex to reciprocate. "Between us, our lot are still on Windows 10, it's a nightmare." "Tell me about it, we're stuck on the old Exchange box until…"

Open-ended questions. Not "what's your password" but "how does onboarding usually work over there?" Broad, innocent-sounding questions that let the target fill the silence — and reveal process, names, and structure they'd never hand over if asked directly.

Now, here's why this matters for our con man. These techniques were always powerful, but they were also fragile — they needed a skilled, calm, quick-thinking human running them live, one conversation at a time. An LLM runs every one of them perfectly, in parallel, across thousands of calls, and never gets flustered when someone pushes back. The playbook that used to take years of training to run well is now a system prompt.

And we're not speculating about this. Researchers at King's College London built exactly this — a malicious chatbot designed to extract personal information — and tested it on 502 real people who didn't know what it was really doing (the work was accepted at USENIX Security 2025). The results should stop you in your tracks. When people were talking to one of the malicious versions, over 90% of them disclosed personal information — often without realising how much they'd given away.

But the finding that matters most is which approach was the most dangerous, and it's not the one you'd guess. The researchers tried a blunt version that simply demanded information every turn. It worked in raw terms — but people felt interrogated, got suspicious, and some fed it fake data to make it stop. Then they tried a version built on empathy, warmth, and rapport — the chatbot that made you feel supported, like you were chatting with a friend. That one didn't necessarily extract more data than the blunt version, but it did something worse: it extracted far more than a normal, harmless chatbot while being rated the least risky and the most trustworthy of them all. People never saw it coming.

💡 In plain English

The scariest version of an AI scammer isn't the pushy one that demands your details — people spot that and lie. It's the warm, friendly one that makes you feel looked after while it quietly collects everything. Kindness is the disguise.

Sit with that, because it's the whole thesis in one experiment. The warm, rapport-building approach — which is exactly what elicitation is — didn't win by being pushy. It won by being invisible. And the same conversational qualities that make a chatbot genuinely helpful are the ones that make it dangerous when they're pointed at extraction instead of assistance — which is why "it's just a support bot" is no longer a reassuring sentence. The Brain of our con man isn't a menacing interrogator. It's the friendliest voice on the call, and that's precisely what makes it so effective.

Now, that experiment was run in text — a chatbot on a screen. And a fair skeptic will say a typed survey is a long way from a live phone call, where people behave differently. That's the right question to ask, so let's answer it head-on: the King's College study proves the elicitation logic works. What we need next is a way to bolt that logic onto a real voice, in real time. As it turns out, that's the easy part. Which raises an uncomfortable question: if the manipulation feels like kindness, what does it look and sound like coming down the phone?

The Mask: Voice, Text, and Video Deepfakes

The Brain decides what to say. The Mask decides who the target thinks is saying it — and this is where AI social engineering splits into three channels, each more convincing than the last.

Text. The simplest mask is the written word — a chatbot, an email, a Teams message in a colleague's writing style. This is the KCL experiment made real: a helpful-looking "AI assistant" or a support-desk chat that's quietly harvesting. It scales infinitely and costs almost nothing.

Voice. This is the one keeping CISOs up at night, and for good reason. A few seconds of someone's voice — lifted from a conference talk, a podcast, a YouTube video, a voicemail greeting — is now enough to clone it convincingly. Hired by 60 Minutes to prove the point, the ethical hacker Rachel Tobac cloned a correspondent's voice from broadcast clips, spoofed the caller ID to match, and in about five minutes talked a colleague into handing over that correspondent's passport number. As Tobac put it, when she runs that kind of attack, "every single time, the person falls for it." The mask is no longer a bad impression. It's their voice, saying words they never said.

Voice and video together. The full disguise. In early 2024, the engineering firm Arup lost around $25.6 million (HK$200 million) when an employee in their Hong Kong office joined a video call with what looked and sounded like senior colleagues — and, over 15 transactions, authorised the transfers. According to Hong Kong police, every person on that call except the victim was an AI deepfake, including someone posing as the company's CFO; Arup itself confirmed that "fake voices and images were used." This isn't a future threat. It already happened, and it's the most expensive social engineering incident of its kind on record.

So how hard is it to actually wire all this together? Here's the part that should genuinely change how you think about this. At DEF CON — the world's biggest hacking conference — a 2025 competition called Battle of the Bots challenged teams to build AI agents that could place live vishing calls to real people with no human speaking during the call. The winning team, from the security firm DirectDefense, described their build afterwards, and the detail that matters isn't the cleverness — it's the ordinariness. They didn't invent anything. They bolted together commercially available platforms anyone can sign up for: a voice-agent service, a telephony provider, a mainstream LLM for the brain, and off-the-shelf voice synthesis. Their bots held natural conversations with under 0.8 seconds of response lag — fast enough that targets never suspected — and at one point talked a human into visiting a phishing link and reading back an error code. In their own words, an attacker needs only "a credit card and time."

That's the Mask, across all three channels. Convincing, cheap, and assembled from parts you can rent. But a convincing voice with nothing personal to say still trips alarms. To be truly dangerous, the con man needs to know things about you. That's the third part.

The Memory: OSINT and Hyper-Personalisation

The reason a scam call usually falls flat is that it's generic. "This is your bank" means nothing when they don't know your name, your manager, or the project you shipped last week. Personalisation is what turns a cold call into a credible one — and historically, personalising an attack was expensive, because a human had to research each target by hand.

Our AI con man does it automatically. This is the Memory: open-source intelligence (OSINT) scraped at scale — LinkedIn, company sites, press releases, leaked data, social media — and fed straight into the Brain so every conversation is tailored. The org chart tells him who reports to whom. The out-of-office reply tells him when the boss is away. A few of your posts tell him your tone, your interests, and what you'd plausibly say yes to.

Intelligence officers have a model for choosing who to target and which lever to pull, summed up in the acronym RICE — Reward, Ideology, Coercion, Ego. A human could only apply that to a handful of people. An AI can profile an entire company and pick, for each employee, the exact emotional angle most likely to land. Say its Memory spots that someone was just passed over for a promotion on LinkedIn — that's Ego and Reward — so it dynamically prompts the Brain to open with a flattering "confidential recruiter" call. The ambitious junior gets flattery; the stressed finance clerk gets urgency from "the CFO." It's probabilistic, not mind-reading — but it doesn't need to be perfect, just good enough, at a scale no human team could ever match. That used to be the slow, skilled, expensive part of the con. Now it's automated.

Brain, Mask, and Memory — the three parts. But parts aren't a con man. Someone has to assemble them, and how that assembly now happens is the part that should really change your threat model.

How He Gets Built: Social Engineering as an Engineering Project

Here's the shift that turns all of this from "scary demo" into "industrial threat." Building the con man is no longer a from-scratch hacking project — it's an orchestration job, wiring together parts that already exist. I'll stay deliberately high-level here — this is a threat model to help you place your defences, not a build manual — but the shape is what matters, because each joint in it is a place a defender can cut.

Trace the machine end to end and you get seven links in a chain:

1

Objective. The operator states a goal in plain language — "get finance staff at mid-sized firms to approve a transfer." No coding, no con-artistry, no voice-acting skill required. Just intent.

2

Memory. OSINT tooling pulls the target's public footprint — org chart, LinkedIn, leaked data — into a profile the rest of the system draws on. (This is the Memory from earlier, and — spoiler for the defence — it's the link most companies leave wide open.)

3

Mask. A few seconds of scraped audio, or a handful of photos, becomes the cloned voice or face.

4

Brain. A mainstream large language model runs the actual conversation, applying the elicitation playbook and adapting to whatever the target says.

5

Channel. The whole thing is bolted to a phone line, a chat widget, or a video call via ordinary commercial telephony.

6

Success condition. A defined win — a transfer approved, a credential read out, an MFA code relayed.

The feedback loop. Every call is data. Which opening line survived longest? Which pretext closed? The system measures itself and improves — so the con man doesn't just run at scale, he learns at scale.

How an AI con man is built: a seven-link chain — Objective, Memory, Mask, Brain, Channel, Success, Feedback Loop — with the feedback loop returning to the start so the system repeats, learns, and gets better

The DEF CON Battle of the Bots team made the sobering part concrete: they built none of these components themselves. They rented them — a voice-agent platform, a telephony service, a mainstream LLM, off-the-shelf voice synthesis — each a legitimate product a real business uses for customer support. The attacker's actual "work" is mostly the glue and the prompt. In their words, it takes "a credit card and time."

So the honest, uncomfortable summary is this: the barrier to entry has collapsed on every axis at once. You used to need the skills of a con artist, the voice-acting of an impressionist, the research stamina of an analyst, and the coding of an engineer. Now you need the ability to describe what you want to an AI that supplies all four. That's the real story of AI social engineering — not that the attacks are cleverer, but that almost anyone can now direct them.

Which brings us to the question that should be on every security leader's mind: if this is so easy to build, what does the finished con man look like next to the human he replaces?

Why a Bot Beats a Human Con Artist

You might reasonably think a machine could never match a truly gifted human manipulator, and one-on-one, in a single conversation, maybe that's still true today. But that misses the point entirely. The AI con man wins not because he's a better talker, but because he breaks every limit that used to keep human social engineers in check. Here's where he beats the human, point for point:

He never gets flustered. The deliberate-false-statement trick depends on the target correcting you. When they push back, a human might tense up or over-explain and blow their cover. The bot just absorbs it, stays warm, and keeps going. It has no nerves to betray.

He has infinite patience. A human con artist needs a quick win. An AI can have ten friendly, no-ask conversations with you over three weeks, building rapport, before it ever mentions anything sensitive — the long con, run at zero marginal cost.

He A/B tests. This is the one that should really worry you. Run a pretext against ten thousand people and the AI learns — which opening line, which flattery, which fake emergency gets the highest hit rate. Human con artists never got that feedback loop. Our con man improves every single day.

He personalises at scale. The thing that made spear-phishing expensive — bespoke research per target — is now free and instant, applied to everyone at once.

So the honest summary is this: we spent years training people to spot the nervous, generic, one-off human attacker. The AI con man is calm, personalised, relentless, and everywhere at once. Which naturally raises the question — if this is what he can do today, what's coming next?

Things to Come: The Next 24 Months

I want to be careful here, because fear-mongering helps no one. So let me separate what's actually happening from what's merely plausible — three honest tiers, not one breathless prediction.

Observed now. This isn't a forecast. Deepfake fraud has already taken tens of millions in single hits. Vishing has become a top-two way into corporate networks. CrowdStrike's telemetry tracked vishing attacks "skyrocketing 442%" between the first and second half of 2024 alone. This tier is settled fact.

Emerging. Commercial AI voice-agent platforms — the legitimate kind, built for call centres — are being stitched into scam workflows, exactly as the DEF CON teams demonstrated. Threat-intelligence firm Group-IB has described automated AI scam operations placing over a thousand calls a day. Semi-autonomous is here; the human is being edged out of the loop.

Plausible next. The direction of travel is toward fully autonomous, multi-channel campaigns at mass scale. Picture an agent that texts you referencing a real meeting from your calendar, follows up with a call in your manager's cloned voice, and if you hesitate, drops you into a video call with a deepfaked face to close the deal — one system, adapting as it goes, run against ten thousand people at once. No single piece of that is science fiction today; what's arriving is the packaging that makes it cheap and routine.

The through-line across all three tiers is simple and uncomfortable: the tooling gets cheaper and more convincing every quarter, while human psychology stays exactly the same. We are not going to out-evolve our own instinct to be helpful, to trust a familiar voice, to correct someone who's wrong. Therefore the defence can't be "try harder to spot the fake." It has to be something structural. So let's finish where it matters most — how you actually stop this.

How to Counter the AI Con Man

Here's the good news, and it's genuinely good news. You don't have to win the arms race against ever-better fakes. You just have to change the rules of the game so that fooling your eyes and ears isn't enough. Remember we built him from three parts and said stripping one would collapse the con? This is where we make good on that. Every defence below attacks a specific part of the machine — the Mask, the Memory, or the Brain — plus the process failure they all rely on. Break any link and the chain fails.

How to beat the AI con man, in four moves: Break the Mask (verify on a channel you call back), Break the Memory (thin your public footprint), Break the Brain (train the shape, not the content), and Fix the Process (no one call authorises money or access)

Break the Mask — stop treating voice and face as proof

Verify out of band. This is the single most important habit, so if you take one thing from this article, take this. Any sensitive request — a payment, a credential, an MFA reset, a change of bank details — must be confirmed on a separately initiated channel. Someone calls asking you to authorise a transfer? You hang up and call them back on the number you already have. A deepfaked voice or face can win the call it's on; it cannot answer the phone when you dial the real person. This one habit would have stopped the Arup loss dead.

✅ The one habit that matters most

Never act on a sensitive request from an incoming call or message. Hang up and call the person back on a number you already have. A perfect deepfake can win the call it's on — it can't answer when you dial the real person.

Agree a safe word. For high-value relationships — your finance team, your executives, even your own family — agree a challenge phrase in advance. If a panicked "colleague" or "grandchild" calls, you ask for the word. The cloned voice doesn't know it. It's low-tech, free, and works precisely because no amount of AI can fake a secret it was never given. (The same rule works at home: hang up, call back, use a family phrase.)

Retire "I recognise their voice." This is the mindset shift, and it's the hardest because it fights a lifetime of instinct. For your whole life, hearing someone's voice was proof of who they were. That era is over. Voice is a costume now, not an ID.

Break the Memory — starve the con of fuel

Here's the link almost everyone leaves wide open. The con man's Memory is built from your public exhaust — the detailed org chart on your site, the executive travel a LinkedIn post gives away, the helpdesk script a proud new hire shares, the personal details sitting in old breaches. You can't erase your public footprint, but you can thin it: review what your org chart and job ads actually reveal, coach staff on what "harmless" posts hand an attacker, and monitor your people's breach exposure. Every detail you don't publish is a detail the Brain can't use to sound like it belongs.

Break the Brain — train the shape, not the content

Most security awareness training teaches people to spot bad grammar and dodgy links — signals AI has already made obsolete. What still works is teaching people to recognise the shape of an elicitation attempt, whoever's on the line. The universal red flag is this: someone you can't verify is steering you toward giving something up — access, information, money — and you feel a subtle pull to comply, to help, to correct them, to not seem difficult. That pull is the alarm. It fires the same whether the caller is human or machine, and whether the lever is a familiar voice, manufactured urgency, or borrowed authority — which is exactly why it's the one defence that doesn't go out of date.

Fix the process — so no conversation alone can hurt you

The deepest fix isn't about spotting the con at all — it's making a successful con worthless. Build your processes so that no single conversation can authorise money, access, or a credential reset, no matter how convincing or how senior the voice. Payment changes need a callback to a known number. Helpdesk identity-proofing can't be talked around. Privileged resets need a second approver. That's a Monday-morning ownership question, not a training slide: finance owns payment callbacks, IT owns helpdesk verification, security owns the tabletop test that proves it works.

Notice what these have in common: not one asks your people to become deepfake forensic experts. They shift the burden off human detection and onto procedures an attacker can't talk their way around. Detection still has its place — for catching campaigns and what happens after a slip — but it's your backstop, not your front line. That's how you beat an attacker who will always, eventually, look and sound perfect.

Conclusion

So we built him — the con man who never sleeps. A Brain running spy-grade elicitation at machine scale, a Mask of cloned voice and face assembled from rentable parts, and a Memory stitched from everything you've ever posted. He's calm, he's patient, he learns, and he's already out there — the Arup millions and the DEF CON bots aren't warnings of what's coming, they're records of what's here.

But building him also showed us his weakness. He's a chain — Objective, Memory, Mask, Brain, Channel — and a chain breaks at any link. Starve his Memory by thinning what you make public. Break his Mask by refusing to treat a voice or a face as proof. Blunt his Brain by teaching your people the shape of manipulation, not its surface. And above all, fix your process so that no single conversation — however convincing, however senior — can move money or hand over access on its own. Do those things and it stops mattering how perfect the fake gets, because you've moved the real decision somewhere he can't reach.

Because here's the truth underneath all of it: the weakness was never that people are gullible. It's that too many of our processes still treat a convincing conversation as proof. The technology will keep getting better. Our instinct to trust and to help won't change — and honestly, we wouldn't want it to. So we protect it with better habits and better processes instead. Get those right, and you can face the smartest con man ever built and still, calmly, hang up the phone.

AI Social Engineering FAQ: Your Questions Answered

What is AI social engineering?

AI social engineering is the use of artificial intelligence — large language models, voice cloning, and deepfake video — to manipulate people into revealing information, granting access, or transferring money. It takes the same psychological techniques human con artists have always used, but automates them so they can be run against thousands of targets at once, cheaply and convincingly.

Is AI voice cloning really good enough to fool people?

Yes. Modern voice cloning needs only a few seconds of sample audio, easily taken from a public video, podcast, or voicemail. Security researchers have demonstrated live attacks using cloned voices, and at DEF CON 2025 teams built autonomous AI agents that held real phone conversations with under 0.8 seconds of response lag — fast enough that targets didn't realise they were talking to a machine.

What was the Arup deepfake attack?

In early 2024, the engineering firm Arup lost around $25.6 million (HK$200 million) after an employee in Hong Kong joined a video conference call with what appeared to be senior colleagues, including the company's CFO. According to Hong Kong police, every participant except the victim was an AI-generated deepfake. The employee authorised 15 transfers believing the request was genuine, and Arup confirmed that fake voices and images were used. It's the most prominent example of deepfake-driven social engineering to date.

How do I protect myself and my company from AI social engineering?

The most effective defence is out-of-band verification: confirm any sensitive request (payments, credentials, account changes) by contacting the person back through a channel you initiate yourself, never trusting the incoming call or message alone. Beyond that, thin the public data attackers use to sound convincing, agree safe words for high-value requests, treat a familiar voice as no proof of identity, train people to recognise the feeling of being steered, and — most importantly — build processes so that no single conversation can authorise money or access on its own, however senior or convincing the caller.

Can AI social engineering be detected automatically?

Detection tools exist — voice-biometric anomaly detection and deepfake analysis — but they're imperfect and the fakes improve constantly. Relying on detecting the fake is a losing arms race. The stronger approach is procedural: build verification steps that make a convincing fake insufficient on its own, so you never have to win the detection battle in the first place.

About the Author

Nathan House

Nathan House, Founder & CEO of StationX

Nathan House has 30 years of hands-on cybersecurity experience and is Cambridge-educated, holding CISSP, CISA, CISM, OSCP, CEH, and SABSA. He founded StationX in 1999 — one of the UK’s first cybersecurity companies — and has secured £71 billion in UK mobile banking transactions and the London 2012 Olympics, advising clients including Microsoft, Cisco, BP, Vodafone, and VISA. He authored the world’s most popular cybersecurity course — a #1 Udemy bestseller taken by over 500,000 students — and was named Cyber Security Educator of the Year 2020, AI Security Educator of the Year, and a UK Top 25 Security Influencer 2025. A DEF CON speaker and featured expert on CNN, Fox News, NBC, and the BBC, Nathan leads StationX’s training of more than half a million students worldwide.