What is AI-generated phishing?

Phishing emails, SMS messages or websites authored by large-language models. Mainstream LLMs (ChatGPT, Claude, Gemini) are jailbroken or prompted indirectly; adversary-tuned variants sold on cybercrime forums (WormGPT, FraudGPT, EvilGPT) are purpose-built for malicious content with no safety guardrails. Distinguishing operational characteristics: grammatically clean output across non-English locales which kills the historical broken-English tell, high-volume target personalization where the LLM scrapes LinkedIn and corporate sites to fit the lure to the named target's role and projects, iterative spam-filter A/B evasion against commercial gateway-detection until detection rate drops, and fake voice and video deepfake components for hybrid email-plus-call lures.

What is WormGPT, FraudGPT and EvilGPT?

Adversary-tuned LLMs sold on cybercrime forums and Telegram channels. WormGPT (first surfaced mid-2023) was a fine-tuned GPT-J variant trained without safety guardrails; FraudGPT and DarkGPT followed with subscription pricing tiers and explicit feature lists for BEC, social engineering and malware authoring. The 2024-2025 generation is built atop open-source Llama and Mistral derivatives that the operator can fine-tune without OpenAI/Anthropic policy enforcement. Pricing typically runs $50-$200/month; some operators bundle with phishing-kit infrastructure as a single subscription. The capability is not unique to these tools - skilled attackers prompt mainstream LLMs around safety policies - but they lower the skill floor materially.

What structural defenses actually stop AI-generated phishing?

Phishing-resistant MFA (FIDO2 and passkeys) cryptographically binds authentication to the legitimate origin so a perfect-prose phishing lure that lands on an AiTM proxy still cannot complete the credential ceremony - the lure quality stops mattering at that layer. OAuth admin-policy in Microsoft Entra and Google Workspace closes the consent-phishing residual channel. Sandbox detonation in the email gateway handles malicious attachments regardless of how convincing the cover email is. Behavior analytics on session activity catches cookie replay post-compromise. Continuous simulation programs that include AI-generated spear-phishing templates build the recognition cues that no longer come from spelling errors.

How do you simulate AI-generated phishing in a phishing program?

The simulation must reflect the post-2023 baseline: grammatically perfect lures, target-specific personalization drawn from public LinkedIn or corporate-site content and content tuned past commercial spam filters. Bait & Phish supports AI-generated template authoring as a first-class campaign type, including the ability to seed the template with the target's role, manager and team context and to A/B variants against a baseline detection rate. Run the simulation at hard difficulty alongside traditional templates so users encounter both qualities and build calibrated recognition. The recognition cue users develop is no longer 'spot the typo' - it is 'is this request actually authorized through the channel I would expect?' That cue is the post-AI-phishing standard.

What authoritative guidance covers AI-generated phishing?

NIST Special Publication 800-63 supplementary identity-proofing notes (2024 update) explicitly call out generative-AI-enhanced impersonation as a threat to remote identity verification. ENISA's 2024 Threat Landscape report flags AI-generated social engineering as the dominant trend for the year. The FBI IC3 2024 annual public-service announcement on AI-enhanced fraud is direct - generative AI has become the standard tooling for high-volume targeted fraud campaigns. The CISA / NSA / FBI joint guidance on AI-driven cyber threats (2024) recommends FIDO2-based phishing-resistant authentication as the primary defense layer. The pattern across all four sources: focus the defense at the structural layer, not at content detection.

AI-Generated Phishing Defense: Why Detection Fails, What Works

Q: Why doesn't AI-content detection stop AI-generated phishing?

Because AI-text classifiers fail in both directions. False negatives: skilled attackers prompt the model to mimic a specific sender's writing style (drawing from public social posts, podcast transcripts, sample emails) and the output bypasses naive watermarking and burstiness/perplexity classifiers. False positives: legitimate workforce communications increasingly are written with LLM assistance, so the false-positive rate against legitimate mail rises in lockstep with attacker capability. Also: even a perfect classifier on text would not stop the credential ceremony from being completed against an AiTM proxy, and would not stop a consent-phishing OAuth grant. Detection is a defensive layer that cannot be load-bearing.

AI-Generated Phishing Defense: Why Detection Fails, What Works

The most reliable signal anti-phishing training relied on for two decades was that phishing emails were grammatically broken. Misspellings, awkward phrasing, the wrong register for a corporate communication - all of it was a tell. That tell died around 2023. AI-generated phishing in 2026 is grammatically clean across every locale a workforce communicates in, hyper-personalized to the target's role and current projects and tuned past commercial spam filters via iterative A/B testing. The defense doctrine has to move from content detection to structural controls.

This post is for the security architect deciding where to invest defense budget against the post-2023 phishing baseline. It walks through what changed in attacker tooling, why AI-content detection cannot be the load-bearing layer, the structural defenses that actually work, and how to update a phishing simulation program to reflect the new reality.

What changed in attacker tooling

Three things, in roughly chronological order. First, mainstream LLMs (ChatGPT, Claude, Gemini) became broadly accessible in late 2022; attackers began jailbreaking and prompt-engineering them to produce phishing content. Second, adversary-tuned variants surfaced on cybercrime forums starting mid-2023 - WormGPT, FraudGPT, DarkGPT, EvilGPT - sold as subscription services with no safety guardrails, often built atop open-source GPT-J, Llama and Mistral derivatives. Third, in 2024-2025 attackers started chaining LLMs with reconnaissance tooling: scrape the target's LinkedIn, public press releases, podcast transcripts and conference talks, then prompt the model to write a lure in their writing style targeting their named projects.

The economics changed fastest. A single attacker with API credit can produce thousands of locale-tailored personalized lures per hour. The same volume in 2020 would have required a phishing-kit team of human authors. Cost per high-quality lure dropped two or three orders of magnitude.

The qualitative output changed too. Where 2018-era phishing was caught at the gateway by spam filters trained on broken English and stylistic anomalies, 2026 lures pass commercial filters at noticeably higher rates. Several enterprise email-security vendors now report AI-generated lures bypassing their content scoring at A/B-tested rates of 30-60% versus 5-10% for human-authored equivalents.

Why content detection cannot be the load-bearing layer

AI-content classifiers were proposed as the symmetric defense - if AI writes the lures, AI should detect them. The classifiers fail in both directions.

False negatives are the obvious failure. A skilled attacker prompts the LLM to mimic a specific sender's writing register - drawn from social media, podcast transcripts, sample emails - and the output bypasses naive watermarking and burstiness/perplexity classifiers. Open research throughout 2024 demonstrated that AI-detection accuracy collapses against adversarially-prompted output. The detection arms race favors the generator because the generator can adapt faster than the detector can be retrained.

False positives are the less-discussed failure. Legitimate workforce communications increasingly are written with LLM assistance - executives drafting all-hands emails through ChatGPT, marketing teams using Claude for product copy, sales reps using Gemini for outreach. False-positive rates against legitimate mail rise in lockstep with attacker capability. A detector tuned to catch attacker-grade output flags substantial chunks of legitimate enterprise mail.

Even a perfect content classifier would not stop the credential ceremony from completing against an AiTM proxy. Would not stop a consent-phishing OAuth grant. Would not stop a malicious attachment from triggering its payload. Detection is one layer among several; the structural layers below have to do the actual work.

The structural defenses that work

Five layers, in order of leverage:

One. Phishing-resistant MFA (FIDO2, passkeys, WebAuthn). The cryptographic ceremony binds authentication to the legitimate origin. An AI-perfect-prose lure that lands on an AiTM reverse proxy cannot complete the ceremony - the WebAuthn challenge will not validate against the wrong domain. The lure quality stops mattering at the credential layer. This is the single highest-leverage control.

Two. OAuth consent restrictions in Microsoft Entra and Google Workspace. AI-generated lures that funnel users into OAuth consent pages still grant API access regardless of authentication strength because the consent flow is a separate code path. Restrict user consent to verified-publisher selected-permissions or admin-only. The consent layer is independent of the authentication layer; both have to be hardened.

Three. Email gateway sandbox detonation. AI-generated cover emails delivering malicious attachments still rely on the attachment doing the work. Microsoft Defender for Office 365 Safe Attachments, Mimecast, Proofpoint TAP and Google Workspace Pre-Delivery Sandbox all run dynamic detonation. The cover-email quality is irrelevant if the payload triggers in the sandbox before delivery.

Four. Behavior analytics on session activity. Even with the above, an AI-perfect lure occasionally lands and a credential is exfiltrated. The detection layer that catches what got through is post-authentication: session-cookie replay from a new ASN, impossible-travel between geographies, anomalous mail-rule creation, mass-download events. Entra ID Protection, Google Workspace Security Investigation Tool, Okta ThreatInsight and Microsoft Defender for Cloud Apps all surface these signals. The structural layers try to make sure this layer rarely has work to do, but it has to be wired.

Five. Continuous simulation that includes AI-generated templates. Phishing simulation programs that ship only easy/regular human-authored templates miss the post-2023 baseline. Run AI-generated spear-phishing templates seeded with target-specific context (role, manager, current projects from public sources) at hard difficulty alongside traditional templates. Users build the recognition cue: "is this request actually authorized through the channel I would expect?" That cue transfers to real-world resistance regardless of lure quality.

What this means for the phishing program

The training-content angle and the simulation-content angle both shift. Training that says "watch for misspellings and awkward phrasing" trains users on a 2018 threat that doesn't predict 2026 attacks. Updated training says "verify out-of-band any request involving payment, credentials, urgency or unusual access" - because the attacker's output looks indistinguishable from a legitimate request at the content layer.

Simulation has to follow. Programs that only run easy/regular human-authored templates produce a click-rate trend that flatters the program but does not predict the population's resistance to a hard AI-generated lure. Add AI-generated spear-phishing at hard difficulty as a quarterly test. Track click-rate-by-difficulty separately so the AI-difficulty cohort has its own trend line. Auto-assign remediation training the moment a user clicks; the remediation should reinforce the structural-recognition cue, not the obsolete spelling-error cue.

What guidance bodies say

NIST Special Publication 800-63 supplementary identity-proofing notes (2024 update) explicitly call out generative-AI-enhanced impersonation as a threat to remote identity verification - the exact use case where a cloned-voice deepfake call follows an AI-generated email lure. ENISA's 2024 Threat Landscape report flags AI-generated social engineering as the dominant trend. The FBI IC3 2024 annual public-service announcement on AI-enhanced fraud is unambiguous: generative AI has become the standard tooling for high-volume targeted fraud campaigns. The CISA / NSA / FBI joint guidance on AI-driven cyber threats (2024) recommends FIDO2-based phishing-resistant authentication as the primary defense.

The pattern across all four authoritative sources is consistent: focus the defense at the structural layer, not at content detection.

Pulling it together

The 2018 anti-phishing playbook (train users on spelling errors, deploy keyword-based gateway filters) decays against a 2026 baseline of grammatically clean, target-personalized, filter-tuned lures. The replacement playbook is structural: phishing-resistant MFA at the credential layer, OAuth admin policy at the consent layer, sandbox detonation at the attachment layer, behavior analytics at the post-compromise layer and AI-template simulation at the user-recognition layer. Detection of AI-content as a load-bearing defense is a dead end that the generator/detector arms race favors the generator decisively.

If you're updating a phishing program for the AI-generated-phishing baseline and want to add hard-difficulty AI-template simulation alongside traditional campaigns, start a free trial covering up to 25 users - the AI-template feature is included by default. For full deployment scoping including phishing-resistant MFA rollout integration, see pricing or contact us.

Blog

AI-Generated Phishing Defense: Why Detection Fails, What Works

What changed in attacker tooling

Why content detection cannot be the load-bearing layer

The structural defenses that work

What this means for the phishing program

What guidance bodies say

Pulling it together

Related reading