CVE-2021-43854
Python vulnerability analysis and mitigation

Overview

NLTK (Natural Language Toolkit) versions prior to 3.6.5 contained a Regular Expression Denial of Service (ReDoS) vulnerability identified as CVE-2021-43854. The vulnerability was present in PunktSentenceTokenizer, sent_tokenize, and word_tokenize functions. The issue was discovered in October 2021 and patched in version 3.6.6 (GitHub Advisory).

Technical details

The vulnerability stemmed from an inefficient regular expression pattern in the PunktSentenceTokenizer implementation. The regex pattern '\S*' at the start of the expression caused the Python regex engine to attempt matching from the beginning of the input, only recognizing failure after reaching a whitespace character or the end of input. This resulted in quadratic time complexity O(n^2), where n is the input length. For a malicious input of length n, the regex engine would require (n^2 + n) / 2 steps to process (GitHub PR). The vulnerability received a CVSS v3.1 score of 7.5 (High) with vector string CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H (GitHub Advisory).

Impact

When exploited, the vulnerability could cause a significant denial of service through CPU exhaustion. Testing showed that processing a malicious input of 100,000 characters took over 56 seconds, with processing time increasing exponentially for longer inputs. This made the system vulnerable to denial of service attacks when processing untrusted input (GitHub Advisory).

Mitigation and workarounds

The vulnerability was patched in NLTK version 3.6.6 by removing the problematic '\S*' pattern from the regex and implementing a new matching approach. For users unable to upgrade, the recommended workaround is to implement a maximum length limit on inputs to the vulnerable functions. After the fix, processing time showed a linear relationship with input length, significantly improving performance (GitHub Advisory).

Additional resources


SourceThis report was generated using AI

Related Python vulnerabilities:

CVE ID

Severity

Score

Technologies

Component name

CISA KEV exploit

Has fix

Published date

CVE-2026-22871HIGH8.7
  • PythonPython
  • guarddog
NoYesJan 13, 2026
GHSA-58pv-8j8x-9vj2HIGH8.6
  • PythonPython
  • jaraco.context
NoYesJan 13, 2026
CVE-2026-22779MEDIUM6.3
  • PythonPython
  • blacksheep
NoYesJan 14, 2026
CVE-2026-21889LOW2.3
  • PythonPython
  • weblate
NoYesJan 14, 2026
CVE-2025-68492LOW2.3
  • PythonPython
  • chainlit
NoYesJan 14, 2026

Free Vulnerability Assessment

Benchmark your Cloud Security Posture

Evaluate your cloud security practices across 9 security domains to benchmark your risk level and identify gaps in your defenses.

Request assessment

Get a personalized demo

Ready to see Wiz in action?

"Best User Experience I have ever seen, provides full visibility to cloud workloads."
David EstlickCISO
"Wiz provides a single pane of glass to see what is going on in our cloud environments."
Adam FletcherChief Security Officer
"We know that if Wiz identifies something as critical, it actually is."
Greg PoniatowskiHead of Threat and Vulnerability Management