
Cloud Vulnerability DB
A community-led vulnerabilities database
In the Linux kernel, the following vulnerability has been resolved:
mm: prevent poison consumption when splitting THP
When performing memory error injection on a THP (Transparent Huge Page) mapped to userspace on an x86 server, the kernel panics with the following trace. The expected behavior is to terminate the affected process instead of panicking the kernel, as the x86 Machine Check code can recover from an in-userspace #MC.
mce: [Hardware Error]: CPU 0: Machine Check Exception: f Bank 3: bd80000000070134
mce: [Hardware Error]: RIP 10:
The root cause of this panic is that handling a memory failure triggered by an in-userspace #MC necessitates splitting the THP. The splitting process employs a mechanism, implemented in try_to_map_unused_to_zeropage(), which reads the pages in the THP to identify zero-filled pages. However, reading the pages in the THP results in a second in-kernel #MC, occurring before the initial memory_failure() completes, ultimately leading to a kernel panic. See the kernel panic call trace on the two #MCs.
First Machine Check occurs // [1] memory_failure() // [2] try_to_split_thp_page() split_huge_page() split_huge_page_to_list_to_order() __folio_split() // [3] remap_page() remove_migration_ptes() remove_migration_pte() try_to_map_unused_to_zeropage() // [4] memchr_inv() // [5] Second Machine Check occurs // [6] Kernel panic
[1] Triggered by accessing a hardware-poisoned THP in userspace, which is typically recoverable by terminating the affected process.
[2] Call folio_set_has_hwpoisoned() before try_to_split_thp_page().
[3] Pass the RMP_USE_SHARED_ZEROPAGE remap flag to remap_page().
[4] Try to map the unused THP to zeropage.
[5] Re-access pages in the hw-poisoned THP in the kernel.
[6] Triggered in-kernel, leading to a panic kernel.
In Step[2], memory_failure() sets the poisoned flag on the page in the THP by TestSetPageHWPoison() before calling try_to_split_thp_page().
As suggested by David Hildenbrand, fix this panic by not accessing to the poisoned page in the THP during zeropage identification, while continuing to scan unaffected pages in the THP for possible zeropage mapping. This prevents a second in-kernel #MC that would cause kernel panic in Step[4].
Thanks to Andrew Zaborowski for his initial work on fixing this issue.
Source: NVD
Free Vulnerability Assessment
Evaluate your cloud security practices across 9 security domains to benchmark your risk level and identify gaps in your defenses.
Get a personalized demo
"Best User Experience I have ever seen, provides full visibility to cloud workloads."
"Wiz provides a single pane of glass to see what is going on in our cloud environments."
"We know that if Wiz identifies something as critical, it actually is."