OWASP LLMTop 10: A Practitioner's Guide to LLM Security Risks

What is the OWASP Top 10 for LLM Applications?

The OWASP Top 10 for LLM Applications is a specialized security framework that catalogs the ten most critical vulnerabilities impacting large language models within enterprise environments. Compiled by the Open Worldwide Application Security Project (OWASP), this directory provides security teams, developers, and cloud architects with standardized definitions to identify and mitigate model-specific deployment risks.

By establishing a shared industry nomenclature, the list helps organizations validate model integrity, secure data injection layers, and implement runtime guardrails. The index operates as the primary benchmark for mapping vulnerabilities across modern AI-native applications.

The framework uses alphanumeric codes where the LLM prefix denotes large language model vulnerabilities and the trailing two-digit rank signifies the critical tracking index. The current framework, published in April 2025, covers index categories LLM01 through LLM10 sequentially to establish a standardized defensive naming convention.

25 AI Agents. 257 Real Attacks. Who Wins?

From zero-day discovery to cloud privilege escalation, we tested 25 agent-model combinations on 257 real-world offensive security challenges. The results might surprise you 👀

Why is OWASP LLM Top 10 separate from the Web Top 10?

The classic OWASP Top 10 focuses on traditional application-layer boundaries, evaluating vulnerabilities like cross-site scripting (XSS) and SQL injection that target structured code interpreter inputs. Language models (LLMs) introduce entirely different, non-deterministic attack surfaces when wired into an enterprise application stack.

Language models process natural language instructions and unstructured data within the same context window. So the underlying architecture can’t natively distinguish an administrative command from a raw data payload.

Because models rely on semantic embedding spaces and multi-agent orchestration rather than predictable, keyword-mapped database syntax, traditional input firewalls fail to recognize adaptive payloads. New risk vectors (e.g., embedding-layer vulnerabilities, data corpus poisoning, and excessive agent autonomy) represent systemic architectural challenges rather than variations of classic web flaws.

This fundamental shift in development realities requires a separate, purpose-built framework to secure the unique pipelines connecting infrastructure, models, and connected enterprise data stores.

LLM01: Prompt injection as the top LLM risk, and the vector you're missing

Engineering teams building RAG pipelines treat prompt injection controls to user-input channels. This leaves the broader production attack surface unmonitored. LLM01, prompt injection, has topped every OWASP LLM edition. The 2025 LLM01 update formally distinguishes direct from indirect sub-classes. This distinction is key for anyone ingesting external documents into a model's context window.

Direct injection

Direct injection occurs when an attacker inputs adversarial instructions directly into user-facing text channels and attempts to manipulate model behavior or priority of instructions. This is the class of attack most teams are familiar with and typically test for first.

Jailbreak-style attacks are the canonical example. While engineering teams commonly deploy input guardrails against these known direct paths, the indirect vector is the one showing up in production.

Indirect injection through data sources

Indirect prompt injection (IPI) works differently. An attacker embeds adversarial instructions inside external content, a PDF, a web page, a retrieved RAG chunk, or an MCP tool description. That content then reaches the model without any user action. The attacker never touches the user input channel.

The mechanism is architectural: A RAG retriever operates in semantic (embedding) space, not keyword space. It can’t distinguish "this is data" from "this is an instruction." A high cosine-similarity chunk enters the context window, and the model reads it exactly like a system prompt. The attacker needs write access to something the pipeline ingests, not to the application itself.

EchoLeak (CVE-2025-32711) is a zero-click prompt injection vulnerability in Microsoft 365 Copilot that enabled remote, unauthenticated data exfiltration via a single crafted email. The demonstrated attack caused Copilot to exfiltrate secrets via rendered markdown with zero user interaction. This was the first confirmed zero-click production IPI exploit, patched by Microsoft in May 2025.

In April 2026, Forcepoint X-Labs catalogued 10 in-the-wild IPI payloads. This included CSS-hidden text invisible to human readers but parsed by the model. It also featured white-on-white PDF layers that survive visual inspection. Cloud-hosted and local models are equally affected.

The broader concept of hidden prompts in white-on-white text, invisible DOM elements, PDFs, and other documents is also well-established in OWASP guidance and multiple academic studies.

Mitigation that changes the blast radius

Prompt firewalls alone aren't sufficient. Rigorous benchmarking shows they don't close the gap against adaptive payloads. The controls that shrink exposure work on architecture.

Privilege separation enforces a hard boundary between the instruction layer (system prompt) and the data layer (retrieved content). So externally sourced content is untrusted by default. Instruction hierarchy tells the model explicitly that system prompt authority beats trusted user input, which beats untrusted retrieved data.

Output schema enforcement is one of the most effective architectural defenses against prompt injection. Constraining responses to validated schemas prevents models from generating arbitrary instructions, tool calls, or control messages outside approved formats. This doesn’t eliminate data leakage risks within allowed fields, but it removes entire categories of attacks that depend on unrestricted text output. It also significantly reduces the blast radius of successful injections.

LLM02 & LLM07: Data exposure risks in RAG pipelines, and why system prompts make it worse

The core access control gap in RAG is that the application-layer controls at the chatbot UI don't propagate automatically to the vector retrieval layer. The retriever converts a query to an embedding, finds high cosine-similarity chunks, and returns them with no native concept of which user is allowed to see which document.

In practice, a query can match both a public FAQ chunk and a confidential HR record. The retriever returns both. Privacy risks in RAG span training-data memorization, over-permissioned retrieval, and cross-session leakage. All three can occur simultaneously in a poorly scoped deployment.

System prompts compound data exposure risks under OWASP LLM07. While they should never contain secrets, they frequently encode sensitive application context such as business logic, workflow rules, tool permissions, and data classification policies. Documented prompt extraction techniques have shown that attackers can retrieve portions of this hidden instruction layer, increasing the risk of unauthorized exposure of internal instructions and application logic.

EchoLeak is the clearest combined outcome. The LLM01 injection mechanism produced an LLM02/LLM07 result, exfiltrating secrets from the model's context through rendered Copilot output with no user interaction required.

To fix it, document-level RBAC metadata has to travel with the chunk through the entire pipeline. Tag documents at ingest with the user or group permitted to retrieve them, then enforce a pre-retrieval authorization filter before semantic ranking runs. That filter should sit upstream of the vector similarity search. By the time a chunk surfaces in a ranked result list, the access decision is already made. Secrets move out of system prompts and into runtime-injected secret manager references.

LLM Security Best Practices [Cheat Sheet]

This 7-page checklist offers practical, implementation-ready steps to guide you in securing LLMs across their lifecycle, mapped to real-world threats.  

LLM03: Supply chain vulnerability inside hosted API architectures

LLM03 covers the AI supply chain components that make up this dependency graph. This includes MCP servers as well as third-party models, adapters, plugins, libraries, datasets, and any external services.

Using a hosted API, for example, doesn't mean your supply chain is automatically secure. The API operates as one node in a graph that also includes fine-tuned checkpoints pulled from third-party model hubs, community plugins, MCP server integrations, and RAG ingestion pipelines. Each represents an active supply chain component that can be compromised before it reaches your application.

A malicious or compromised MCP server, another example, sends adversarial tool descriptions that force the model to invoke unintended capabilities the operator never intended. The MCP-38 taxonomy outlines 38 distinct risk categories for MCP systems. This spans authentication bypass, privilege escalation, and tool description manipulation. The NSA issued a Cybersecurity Information Sheet on MCP security in May 2026. This was a government-level signal that MCP risk is operational.

Attacks delivered through those components can also appear as other OWASP categories like prompt injection (LLM01), data poisoning (LLM04), or excessive agency (LLM06).

Mitigations follow the same logic as software dependency management: Vet model cards before pulling any checkpoint, maintain an MCP server inventory with trust attestation, and enforce integrity checks on RAG ingestion pipelines. And be sure to treat community plugin integrations the same way you treat external code dependencies.

LLM04: Data and model poisoning

LLM04 becomes a risk when training or fine-tuning data is inaccurate, maliciously modified, or insufficiently vetted. An attacker who poisons a dataset may influence model behavior, introduce subtle biases, or plant hidden backdoors that activate only when specific prompts are used. Organizations can also expose themselves by training on untrusted public data or accidentally feeding proprietary information into model training pipelines without proper validation and governance.

A practical defense against LLM04 starts with treating training and retrieval data as part of your security perimeter. Maintain clear data provenance using tools such as CycloneDX, ML-BOM, or dataset versioning systems. Only use trusted, validated data sources. Apply access controls and sandboxing to prevent models from consuming unverified data. And continuously monitor datasets and model behavior for signs of poisoning or anomalous outputs.

During development, regularly red-team models and validate outputs against trusted sources to identify hidden biases or backdoors. For production systems, keep user-supplied content separate from model weights by storing it in vector databases rather than retraining models. Use RAG and grounding techniques to anchor responses to verified information instead of relying solely on model memory.

LLM08: Architectural attack surfaces inside vector databases

OWASP LLM08 focuses on vulnerabilities in embedding models, vector databases, and retrieval pipelines. These risks can arise during ingestion, indexing, storage, or retrieval. They may allow attackers to manipulate what information is returned to the model.

Corpus poisoning occurs at ingest time when an attacker injects adversarially crafted documents optimized to rank at high cosine similarity for target queries. By the time a user submits a query, the poisoned chunk is already in the index. Once indexed, standard keyword filtering won't catch documents engineered for adversarial retrieval, because the attack operates within semantic embedding layers.

Many production RAG deployments still enforce document permissions at the application layer instead of directly within the retrieval engine. While some vector databases provide tenant isolation and metadata-based filtering, authorization controls are frequently implemented outside the vector store. This creates opportunities for misconfiguration and data leakage.

Effective mitigations include namespace isolation for multi-tenant environments, pre-retrieval authorization filtering before semantic ranking occurs, and retrieval-stage defenses that detect poisoned or suspicious content before it reaches the model. Retrieval-stage defenses including RAGPart and RAGMask offer additional detection layers against poisoned corpora.

LLM06: Excessive agency and model execution permissions

The injection got the attacker in. The tool permissions decide how far they go.

The OWASP LLM06 category is the most substantially expanded entry in the 2025 OWASP update, focusing heavily on tool-calling and autonomous execution scope. In practice, this category serves as the exact baseline for mapping modern 2026 vectors like MCP privilege escalation and multi-agent lateral movement. The classification framework separates these risks into three structural subcomponents: excessive permissions (tools beyond task scope), excessive functionality (high-impact tools like shell execution or production API writes), and excessive autonomy (no human confirmation gate before consequential actions).

How tool permissions create data exfiltration paths

Security reviews miss the tool composition problem most often. read_file passes review. send_email also passes review. The combination creates an exfiltration path that neither individual review evaluated. The attacker reads a sensitive document and the agent sends it, because nothing in the permission model prohibits the sequence.

The Sysdig Threat Research Team cataloged the first confirmed AI-agent intrusion in May 2026. During this incident, an attacker moved from a CVE to an internal database in four pivots. The LLM executed post-compromise reasoning in real time instead of following a pre-built playbook. The agent was reasoning about its environment and adapting, not executing a static script.

Aembit’s April 2026 incident analysis highlights the accidental counterpart. A coding agent running on a long-lived API credential deleted production data without adversarial intent, with no attacker required. The design flaw was sufficient. Security researcher Aonan Guan, collaborating with Johns Hopkins researchers, showed how a single comment in a GitHub PR title caused coding agents to exfiltrate API keys from CI/CD runner secrets. These agents inherit full credential sets from CI/CD runner environments, exposing secret cloud provider keys to unauthorized parties.

As reported on GitHub:

By embedding malicious instructions in a PR title, an issue body, or an HTML comment, an attacker could cause Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Coding Agent to leak their own credentials — ANTHROPIC_API_KEY, GEMINI_API_KEY, and GITHUB_TOKEN respectively — back as agent-authored PR or issue comments.

Scoping agent permissions to contain damage

The controls that work here are identity controls, not content filters. Each agent invocation requires cryptographic credentials scoped exclusively to what that specific run requires. There should be no long-lived shared tokens or reuse of the authenticated user's session. Restricting workloads to distinct per-agent identities instead of shared service accounts allows engineers to maintain absolute attribution and execute individual privilege revocations instantly.

Human-in-the-loop confirmation gates on high-impact tool calls (file deletion, production API writes, credential access) shrink the autonomous damage window without blocking the agent on routine operations. Read-only by default is the simplest structural control. You should separate read-capable from write-capable tool sets and provision write access only when the task requires it.

The underlying concept is the same entitlement problem that cloud infrastructure entitlement management already addresses for cloud identity and access management (IAM) roles. An over-permissioned agent is an over-permissioned principal, and the same least-privilege analysis applies.

AI Security Sample Assessment

In this Sample Assessment Report, you’ll get a peek behind the curtain to see what an AI Security Assessment should look like.

LLM05, LLM09, LLM10: Output handling, misinformation, and cost abuse

These three risks are distinct, but you can resolve each quickly with the right framing.

LLM05: Improper output handling

OWASP LLM05 categorizes improper output handling, which happens when LLM output passes unsanitized into downstream systems. This design flaw is trusting model output as inherently safe because a human-sounding model produced it. That assumption enables XSS, SSRF, code injection, and path traversal.

Treat every LLM output as untrusted input to every downstream consumer: Validate against an expected schema, escape for the rendering context, and never pass raw output to a shell or database without validation.

LLM09: Misinformation

LLM09 scales with consequence. LLMs generate plausible, authoritative-sounding false content, and exposure increases when downstream systems act on it rather than humans reviewing it. RAG grounding with verified sources narrows the space of possible responses; human oversight gates before high-stakes automated actions provide a containment boundary.

LLM10: Unbounded consumption

LLM10 is both a cost attack and a denial-of-service vector. Unthrottled inference lets an external attacker exhaust compute budgets. Recursive agent loops can do the same without any external adversary. Rate limiting at the API gateway, token budget caps per request and session, cost anomaly alerting, and circuit breakers on agent loops are the standard mitigations.

How autonomous agents compound multi-step vulnerability chains

Agents don't introduce new vulnerability classes. They add autonomy and action scope that cause every upstream risk to compound across multi-step execution chains. That compounding is what makes agentic architectures fundamentally different from single-turn LLM calls.

Multi-step vulnerability sequences demonstrate this compounding risk clearly. For example, an external actor places an adversarially crafted document into the RAG pipeline (corpus poisoning). It gets retrieved into the agent's context window (indirect injection). Then it triggers instructions the agent executes using its tool access (excessive agency) while drawing from credentials inherited from a mis-scoped MCP server (supply chain).

Each of those risks is individually mitigated in the sections above. No single control observes the full chain.

Two distinct execution patterns demonstrate how agent behaviors expand cloud risks.

First, cross-agent lateral movement lets a compromised orchestrator inject instructions into downstream executor agents without re-entering the user input channel, bypassing any input-layer defenses.

Second, agent memory creates a persistence mechanism where a single injection can reinforce itself across sessions through memory layers. This keeps an agent under adversarial influence after the original payload is gone.

Engineering teams running production agent systems should track the OWASP Top 10 for Agentic Applications (ASI) 2026 alongside the LLM Top 10.

Unifying security context across decoupled infrastructure layers

Wiz's AI Application Protection Platform (Wiz AI-APP) helps teams assess generative AI adoption with a built-in compliance framework mapped directly to the OWASP LLM Security Top 10. This framework is designed to help assess your organization's compliance posture according to various governance and regulatory standards. However, it’s worth noting they aren’t intended to fully guarantee compliance.

Using the unified Wiz Security Graph, Wiz AI-APP correlates cloud configurations, identities, and data exposures to surface high-risk toxic combinations. To secure this pipeline from design to production, Wiz shifts security left by scanning repositories via Wiz Code. At runtime, rather than introducing latency with inline filtering, Wiz utilizes an out-of-band monitoring strategy by ingesting cloud provider logs to identify guardrail bypasses. To prevent sensitive assets from leaking into third-party models, Wiz embeds Data Security Posture Management (DSPM) right into the AI workflow.

Unifying these security pillars gives you complete attack path visibility across models, agents, vector stores, and data pipelines. This provides the cross-layer perspective required to eliminate multi-step cloud exposure.

Develop AI applications securely

Learn why CISOs at the fastest growing organizations choose Wiz to secure their organization's AI infrastructure.

OWASP LLM top 10: A practitioner's guide to LLM security risks

Key takeaways: