
Cloud Vulnerability DB
A community-led vulnerabilities database
vLLM, a high-throughput and memory-efficient inference and serving engine for LLMs, was found to have a vulnerability in its structured output support. The vulnerability (CVE-2025-29770) was discovered in versions prior to 0.8.0, where the outlines library's cache for compiled grammars on the local filesystem was enabled by default. This cache was accessible through the OpenAI compatible API server (GitHub Advisory).
The vulnerability exists in the file vllm/modelexecutor/guideddecoding/outlineslogitsprocessors.py, which unconditionally uses the cache from outlines. The issue has been assigned a CVSS v3.1 base score of 6.5 (Medium) with a vector string of CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H. The vulnerability is classified under CWE-770 (Allocation of Resources Without Limits or Throttling) (GitHub Advisory, NVD).
A malicious user can exploit this vulnerability by sending a stream of very short decoding requests with unique schemas, resulting in continuous additions to the cache for each request. This can lead to a Denial of Service (DoS) condition if the filesystem runs out of space. The impact is significant even if vLLM is configured to use a different backend by default, as attackers can still choose outlines on a per-request basis using the guideddecodingbackend key of the extra_body field (GitHub Advisory).
The issue has been fixed in version 0.8.0 by disabling the cache by default. For users who still want to use the cache, it can be enabled by setting the VLLMV0USEOUTLINESCACHE environment variable to 1. For versions prior to 0.8.0, the only workaround is to prevent untrusted access to the OpenAI compatible API server (GitHub Advisory).
Source: This report was generated using AI
Free Vulnerability Assessment
Evaluate your cloud security practices across 9 security domains to benchmark your risk level and identify gaps in your defenses.
Get a personalized demo
"Best User Experience I have ever seen, provides full visibility to cloud workloads."
"Wiz provides a single pane of glass to see what is going on in our cloud environments."
"We know that if Wiz identifies something as critical, it actually is."