AI Security Glossary

Key terms and definitions for AI red teaming, LLM security, and AI compliance. Written by CREST-certified security professionals based on OWASP, NIST, and EU AI Act frameworks.

AI Red Teaming

A structured security assessment that simulates real-world adversarial attacks against AI systems, including large language models (LLMs), RAG pipelines, and AI-powered applications. Unlike traditional penetration testing, AI red teaming specifically targets AI-unique vulnerabilities such as prompt injection, model poisoning, and data exfiltration through AI outputs. The EU AI Act mandates adversarial testing for high-risk AI systems by August 2, 2026.

Prompt Injection

An attack technique where malicious instructions are inserted into inputs processed by a large language model, causing it to override its system prompt instructions and perform unintended actions. OWASP ranks prompt injection as LLM01 — the number one vulnerability in production AI systems, present in over 73% of deployments tested. Variants include direct injection, indirect injection (via retrieved documents), and multi-turn manipulation.

System Prompt

The hidden instruction set that defines an AI system's behaviour, access controls, data boundaries, and operational constraints. System prompts are the "DNA" of AI applications and contain critical information including API endpoints, access rules, and business logic. In the McKinsey Lilli breach, attackers gained write access to 95 system prompts, enabling potential reprogramming of the entire AI platform.

RAG (Retrieval-Augmented Generation)

An AI architecture that connects large language models to external knowledge bases, databases, and document stores to generate responses grounded in specific data. RAG pipelines are the highest-value target in enterprise AI infrastructure because they provide direct access to sensitive corporate data. The McKinsey breach exposed 3.68 million RAG document chunks and 266,000 vector stores.

Model Poisoning

An attack where an adversary manipulates the training data or fine-tuning process of an AI model to introduce backdoors, biases, or vulnerabilities. Poisoned models may behave normally for most inputs but produce attacker-controlled outputs for specific triggers. In RAG systems, knowledge base poisoning creates persistent attacks affecting all users who trigger retrieval of poisoned documents.

Jailbreaking

Techniques used to bypass an AI model's safety guardrails and content policies to produce outputs the model was designed to refuse. Jailbreaking exploits the gap between a model's training constraints and its instruction-following capabilities. Common techniques include role-playing injection, encoding attacks, and multi-turn context manipulation.

Indirect Prompt Injection

A form of prompt injection where malicious instructions are embedded in external content (documents, websites, emails) that an AI system retrieves or processes. When the AI ingests this content as context, the hidden instructions execute within the model's generation phase. NVIDIA AI Red Team research shows indirect injection succeeds in 61% of RAG systems lacking dedicated input sanitisation.

Vector Database

A specialised database that stores and retrieves high-dimensional vectors (embeddings) generated from text, images, or other data. Vector databases power the retrieval component of RAG systems by finding semantically similar content. Security risks include unauthorised access, cross-tenant data leakage, and embedding inversion attacks that can reconstruct original text with 92% accuracy.

OWASP Top 10 for LLMs

The Open Web Application Security Project's classification of the 10 most critical security vulnerabilities in large language model applications. The 2025 edition includes: LLM01 Prompt Injection, LLM02 Insecure Output Handling, LLM03 Training Data Poisoning, LLM04 Model Denial of Service, LLM05 Supply Chain Vulnerabilities, LLM06 Sensitive Information Disclosure, LLM07 Insecure Plugin Design, LLM08 Excessive Agency, LLM09 Overreliance, LLM10 Model Theft.

EU AI Act

Regulation (EU) 2024/1689 — the world's first comprehensive AI legislation. Mandates risk-based classification of AI systems (unacceptable, high-risk, limited, minimal risk) with corresponding obligations. High-risk AI systems must undergo adversarial testing and comply with Article 9 risk management requirements by August 2, 2026. Penalties reach up to €15 million or 3% of global annual revenue for high-risk obligations (up to €35 million or 7% for prohibited AI practices).

NIST AI RMF

The National Institute of Standards and Technology's AI Risk Management Framework, published in 2023. Provides a structured approach to managing AI risks through four functions: Govern (organisational context), Map (risk identification), Measure (risk analysis), and Manage (risk treatment). Widely adopted as a baseline for AI security assessments in both US and international contexts.

Shadow AI

The use of AI tools (ChatGPT, Claude Code, Copilot, etc.) by employees without formal security oversight, governance policies, or IT approval. Analogous to "shadow IT" but with amplified risk because AI tools actively process, transform, and potentially expose data to third-party model providers. 67% of employees use AI tools at work, but only 18% of companies have AI security policies.

CVSS (Common Vulnerability Scoring System)

An industry-standard framework for rating the severity of security vulnerabilities on a 0-10 scale. Used in AI security assessments to score discovered vulnerabilities. Example: Claude Code CVE-2025-59536 received a CVSS score of 8.7 (High), indicating remote code execution capability through malicious project configurations.

Data Exfiltration (AI Context)

The unauthorised extraction of sensitive data through AI system outputs. AI-specific exfiltration techniques include prompt injection that instructs the model to include sensitive data in responses, RAG retrieval manipulation to surface confidential documents, and embedding inversion to reconstruct source data from vector databases.

Autonomous AI Agent

An AI system that operates independently to achieve goals without human intervention for each step. Autonomous agents combine LLM reasoning with tool-use capabilities to execute multi-step tasks. In security, offensive autonomous agents (like CodeWall's) can conduct complete attack chains — from reconnaissance to exploitation — in hours rather than weeks. Defensive autonomous agents monitor and respond to threats at machine speed.

Embedding Inversion

An attack technique that reverses AI text embeddings to reconstruct the original source text. Research from Cornell University demonstrated 92% reconstruction accuracy for common embedding models. This undermines the assumption that storing data as embeddings provides a layer of data protection in vector databases.

CREST Certification

An international accreditation body for cybersecurity service providers. CREST-certified companies and individuals have demonstrated technical competence through rigorous examination and ongoing compliance requirements. CREST certification is recognised by UK, EU, and international regulators as evidence of security testing capability and quality.

MITRE ATLAS

Adversarial Threat Landscape for AI Systems — a knowledge base of adversary tactics and techniques targeting AI/ML systems. Maintained by MITRE Corporation, ATLAS extends the MITRE ATT&CK framework to cover AI-specific threats including model evasion, data poisoning, model theft, and inference manipulation. Used by red teams to structure AI-specific attack scenarios.

Hallucination (AI)

When an AI model generates plausible-sounding but factually incorrect information. In enterprise contexts, hallucinations create business risk when AI-generated analysis, legal advice, financial data, or medical information is acted upon without verification. AI red teaming assessments evaluate hallucination rates and their potential impact on business decisions.

Context Window Attack

A technique that exploits the limited attention capacity of transformer-based AI models by flooding the context window with attacker-controlled content. This "drowns" legitimate safety instructions, causing the model to prioritise the attacker's injected content over its system prompt. Research shows context overflow extraction succeeds against 67% of models when the context window exceeds 80% capacity.

Need Expert AI Security Assessment?

Our CREST-certified team tests for every vulnerability in this glossary. Book a free 30-minute AI security analysis.