AI vs AI: How Autonomous Agents Are Changing Red Teaming

The era of human-paced hacking is over. On February 28, 2026, an autonomous AI agent breached McKinsey's Lilli platform in 2 hours — finding 22 unauthenticated API endpoints, exploiting a SQL injection vulnerability, and gaining full read-write access to 46.5 million messages. No human hacker typed a single command. The agent operated autonomously from reconnaissance to full compromise.

This wasn't an isolated demonstration. According to Check Point Research, AI-augmented attacks increased 1,265% between Q1 2025 and Q1 2026. The threat landscape has fundamentally shifted: attackers now deploy AI agents that work 24/7, never tire, and operate at machine speed against targets that still rely on human-paced defences.

How Autonomous AI Agents Attack

Modern autonomous offensive agents combine large language models with tool-use capabilities to execute multi-step attack chains:

Phase 1: Reconnaissance at Scale

AI agents can map an organisation's entire attack surface in minutes. They systematically enumerate subdomains, API endpoints, technology stacks, and potential entry points — tasks that would take a human penetration tester days. The McKinsey attacker (CodeWall's AI agent) discovered 200+ documented API endpoints and identified the 22 unauthenticated ones through automated analysis.

Phase 2: Vulnerability Discovery

Unlike traditional vulnerability scanners that match known signatures, AI agents understand context. They can identify logical flaws, authentication bypasses, and business logic vulnerabilities that signature-based tools miss. The McKinsey SQL injection was in JSON field names — a subtle vulnerability where values were parameterised but field names were concatenated directly into SQL. Traditional scanners wouldn't flag this pattern.

Phase 3: Exploitation and Chaining

AI agents chain multiple low-severity findings into high-impact exploits. A minor information disclosure + an authentication weakness + a misconfigured endpoint = full system compromise. The agent reasons about how findings relate to each other, creating attack chains that no individual vulnerability would suggest.

Phase 4: Data Exfiltration and Persistence

Once inside, AI agents can systematically extract and categorise data, establish persistence mechanisms, and cover their tracks — all without human intervention. The McKinsey agent achieved write access to system prompts, meaning it could have established permanent, invisible backdoors in the AI platform itself.

The Speed Advantage: Machines vs. Humans

Attack Phase	Human Attacker	AI Agent	Speed Multiplier
Reconnaissance	2-5 days	15-30 minutes	200-500x
Vulnerability identification	3-7 days	30-60 minutes	100-300x
Exploitation	1-3 days	5-30 minutes	50-200x
Full compromise	1-4 weeks	1-4 hours	40-170x
Data classification	Days-weeks	Minutes-hours	100x+

Source: CodeWall benchmarks, SANS Institute AI Offensive Security Report 2026

The implication is stark: your incident response window has collapsed from days to minutes. Traditional security operations that assume hours or days to detect and respond are no longer viable against autonomous attackers.

AI Agents in the Attacker's Toolkit

The tools available to attackers are becoming increasingly sophisticated:

CodeWall — The platform that breached McKinsey. Combines autonomous reconnaissance with vulnerability discovery and exploitation. While designed for defensive testing, the techniques are replicable.
PentestGPT and similar frameworks — Open-source autonomous penetration testing agents that combine LLM reasoning with standard security tools (nmap, Burp Suite, Metasploit).
AI-powered phishing — Agents that craft personalised phishing campaigns at scale, using scraped LinkedIn data and company information to generate contextually perfect social engineering attacks.
Adversarial AI for AI systems — AI agents specifically designed to attack other AI systems through prompt injection, RAG poisoning, and model manipulation.

"Hackers will be using the same technology to attack indiscriminately, with specific objectives like financial blackmail for data loss or ransomware."
— Paul Price, CEO, CodeWall (after the McKinsey breach disclosure)

Why Traditional Defences Fail Against AI Agents

1. Speed Mismatch

SOC teams detect threats in hours to days (median dwell time: 10 days per CrowdStrike 2025). AI agents complete full kill chains in hours. By the time a human analyst investigates, the attack is finished.

2. Pattern Recognition Limits

WAFs and IDS/IPS rely on known patterns. AI agents generate novel attack payloads that are semantically equivalent but syntactically unique — defeating signature-based detection every time.

3. Volume Overwhelm

AI agents can probe thousands of endpoints simultaneously, generating alert volumes that overwhelm human analysts. The signal-to-noise ratio drops to a level where real attacks hide in the noise of probing activity.

4. Adaptive Behaviour

When an attack technique fails, AI agents adapt in real-time. They try alternative approaches, modify payloads, change timing patterns, and learn from each failed attempt — without the frustration or fatigue that limits human attackers.

How to Defend Against Autonomous AI Attacks

Fight AI with AI: Autonomous Defence

AI-powered detection — Deploy AI-based security monitoring that operates at the same speed as AI attackers. Behavioural analysis, anomaly detection, and real-time threat assessment must be automated.
Automated response — Implement automated containment for high-confidence detections. When an AI agent is probing your APIs at machine speed, human-in-the-loop response is too slow.
AI red teaming — Test your defences with the same autonomous tools attackers use. Our AI Security Configuration Review includes autonomous agent testing as part of the methodology.

Reduce the Attack Surface

API minimisation — McKinsey had 200+ API endpoints, 22 unauthenticated. Reduce your API surface area to only what's necessary. Every endpoint is a target.
Zero-trust for AI — Treat every AI interaction (including from internal tools like Claude Code) as potentially adversarial. Verify every request, limit every permission.
Segmentation — Isolate AI systems from critical infrastructure. Even if an AI platform is compromised, prevent lateral movement to databases, financial systems, and identity providers.

Build Resilience

Assume breach — Plan for AI agents gaining access. Focus on detection, containment, and limiting impact rather than solely on prevention.
Regular testing — Conduct AI-specific red teaming assessments quarterly. The threat landscape evolves too fast for annual testing cycles.
Incident response for AI — Update your IR playbooks with AI-specific scenarios. A compromised system prompt or poisoned RAG pipeline requires different response procedures than a traditional breach.

The Regulatory Response

Regulators are beginning to address autonomous AI attacks:

EU AI Act — Requires risk management against "reasonably foreseeable misuse," which now explicitly includes autonomous agent attacks. The August 2026 deadline applies.
NIST AI RMF — The framework's "Govern" and "Manage" functions specifically address adversarial AI risks.
UK NCSC — The National Cyber Security Centre published guidance in 2025 on defending against AI-augmented cyber attacks.

For financial services, the implications are particularly severe: DORA requires advanced threat-led penetration testing that must now account for AI agent capabilities.

What This Means for Your Security Programme

The autonomous AI agent threat requires a fundamental shift in security thinking:

From periodic testing to continuous validation — Annual penetration tests can't keep pace with AI-speed threats
From human-paced response to automated containment — Playbooks that assume hours of analysis time are obsolete
From signature-based detection to behavioural analysis — AI agents generate novel attacks that evade pattern matching
From perimeter defence to zero-trust — Assume the perimeter will be breached; focus on limiting impact

Start assessing your readiness with our free 25-point AI security checklist, or book a comprehensive AI red teaming assessment to test your defences against autonomous agent attack scenarios.

References

CodeWall, "McKinsey Lilli Platform Security Assessment," February 2026
Check Point Research, "AI-Augmented Cyber Attacks: 2025-2026 Trend Report," 2026
SANS Institute, "AI in Offensive Security: Capabilities and Countermeasures," 2026
CrowdStrike, "Global Threat Report," 2025
UK NCSC, "Guidance on AI-Augmented Cyber Threats," 2025
NIST, "AI Risk Management Framework (AI RMF 1.0)," 2023
European Parliament, "Regulation (EU) 2024/1689 — EU AI Act," 2024
Cyber Security Switzerland: Autonomous AI Attacks Encyclopedia
RedTeam Partners Switzerland: AI vs AI Security

AI vs AI: How Autonomous Agents Are Changing Red Teaming — And Why Your Defences Must Evolve