On September 6th, 2023, Microsoft published a follow-up to their initial investigative report from July 11th about Storm-0558 — a threat actor attributed to China who managed to acquire a signing key that allowed them to gain illicit access to Exchange and Outlook accounts. Microsoft should be applauded for the high level of transparency they have shown, and their willingness to share this information with the community. However, we feel that the latest blog post raises as many questions as it answers.
The following is a summary of the new information provided in Microsoft’s latest report about how the signing key may have been compromised by the threat actor (see the diagram above for a visual representation of the attack flow as we currently understand it):
There is evidence that a Microsoft engineer’s corporate account was compromised by Storm-0558 “[at some point] after April 2021.”, using an access token obtained from a machine infected with malware.
This engineer had permission to access a debugging server in Microsoft’s corporate network.
This debugging server contained a crash dump that originated in a signing system located in Microsoft’s isolated production network.
This crash dump, which was the result of a crash that occurred in April 2021, contained the aforementioned MSA signing key.
The inclusion of the signing key in this crash dump was the result of a bug, and a separate bug caused the signing key to remain undetected on the debugging server.
Based on the events described above, Microsoft has concluded that the most likely method by which Storm-0558 acquired the MSA signing key was through this compromised account, by accessing the debugging server and exfiltrating the crash dump that contained the key material.
Besides providing the above information about how the key was most likely to have been compromised, Microsoft’s latest report also publicly corroborates our own conclusions (published July 21st) about the contributing factors to this incident, namely:
Prior to the discovery of this threat actor in June 2023, the Azure AD SDK (described in the report as a “library of documentation and helper APIs”) did not include functionality to properly validate an authentication token’s issuer ID. In other words, as we explained in our previous blog post, any application relying solely on the SDK for implementing authentication would have been at risk of accepting tokens signed by the wrong key type.
As mentioned in Microsoft’s original report, Exchange was affected by a vulnerability that caused it to accept Azure AD authentication tokens as valid even though they were signed by an MSA signing key – this vulnerability was ultimately exploited by Storm-0558 to gain access to enterprise accounts. In their latest report, Microsoft clarified that this issue was in fact a result of the missing validation functionality in the SDK: at some point in 2022, the development team in charge of authentication in Exchange incorrectly assumed that the Azure AD SDK performed issuer validation by default. This caused validation to be implemented incorrectly, leading to a vulnerability.
The timeline that can be deduced from the latest report seems to indicate that due to log retention policies (understandable, given that the activity might have stretched over two years), Microsoft can only partially account for all of this threat actor’s activity within their network between April 2021 and May 2023. Additionally, the report does not explicitly state when the crash dump was transferred to the debugging environment or when the engineer’s account was compromised; only that each of these events occurred sometime after April 2021. If we assume that they both happened at the earliest possible point on the timeline — let’s say May 2021 — then that would mean that the threat actor might have been in possession of the signing key for over two years prior to being discovered in June 2023. Furthermore, while Microsoft have reviewed their logs and definitively identified the use of forged authentication tokens for Exchange and Outlook accounts throughout May 2023, we are nevertheless led to the conclusion that the threat actor might have been forging authentication tokens for other services during this two-year period.
As we explained in our last blog post on the subject, someone in possession of this MSA signing key was not limited to forging authentication tokens for just Exchange and Outlook – they could have forged tokens that would have allowed them to impersonate consumer accounts in any consumer or mixed-audience application, and enterprise accounts in any application that implemented validation incorrectly, such as Exchange. In other words, Storm-0558 was in a position to gain access to a wide range of accounts in applications operated by Microsoft (such as SharePoint) or their customers. As we explained in our previous blog post, this was a very powerful key.
Based on what we can learn from Microsoft’s latest report, cloud customers should have the following takeaways from this incident:
Organizations should scan their logs for evidence related to this activity in a time window spanning the period between April 2021 and June 2023 (Microsoft could narrow this window by stating precisely when the engineer’s account was compromised).
Organizations should use a hardware security module (HSM) for key storage whenever possible — this will ensure that key material is never included in crash dumps. As others have noted, the scale at which Microsoft operates might have made this impossible for them to do, but smaller organizations should certainly make it a priority.
As a precautionary defense-in-depth measure, debugging and crash dump data should be purged on a regular basis, since they can contain decrypted information which might be a gold mine for threat actors once they gain access to the environment. In general, sensitive secrets can often be found in unexpected places, such as bash history, hidden image layers, etc.
Additionally, organizations should maintain an inventory of assets in which debugging and crash dump data is collected, stored, or catalogued, and ensure that access controls are in place to limit these assets’ exposure.
Sensitive production environments should be properly isolated from corporate environments which are at higher risk of compromise. While there is no evidence to indicate that the threat actor managed to break through Microsoft’s security boundaries or reach the production environment itself, the root cause here was a failure of data hygiene when transferring potentially sensitive data between the two environments.
Signing keys should be rotated on a regular basis, ideally every few weeks. In this case, the acquired signing key was issued in April 2016 and expired in April 2021, but remained valid until it was finally rotated in July 2023 following Microsoft’s investigation of this incident. This means the key was very long-lived and in use for over 7 years. While Microsoft rotated their signing keys following this incident, at least one (key id -KI3Q9nNR7bRofxmeZoXqbHZGew) appears in both a current key list and in the same list where it appeared in October 2022. If this key remains in use, it should be rotated as well, if only to limit the impact of any (admittedly unlikely) similar potential incident.
Secret scanning mechanisms — particularly those put in place to mitigate the risk of keys leaking from high-to-low trust environments — should be regularly monitored and tested for effectiveness.
Defaults are powerful, and documentation alone isn’t good enough for shaping developer behavior. SDKs should either implement critical functionality by default, or warn users if and when they’ve missed a vital implementation step that must be performed manually. If developers at Microsoft misunderstood their own documentation and made this critical mistake, it stands to reason that any one of their customers might have done the same.
Although Microsoft’s report answers some of the burning questions related to this case, there remain several unanswered questions:
Was this, in fact, how Storm-0558 acquired the signing key? Microsoft have stated that their investigation has concluded, meaning that they have exhausted all evidence available to them. Therefore, we will probably never have a definitive answer to this question.
How likely is it that other signing keys that were valid during the two-year period were compromised in the same way? Is there evidence to the contrary? (This would obviously be very hard to prove.)
When exactly was the engineer’s account compromised? Most importantly, what is the earliest possible point in time at which Storm-0558 could have acquired the signing key?
Was the threat actor targeting this engineer specifically because of their access to the debugging environment, or did they have other goals in mind?
Was the engineer’s account and the machine infected with malware the only known compromised entities within Microsoft’s corporate environment during this period? Did the investigation identify other compromised users or systems? When (and how) did the attacker establish their initial foothold in the environment?
When Microsoft says that they haven’t observed the threat actor targeting the users of any applications other than Exchange and Outlook, does this mean that they have definitively proven that the threat actor did not forge access tokens for other services? In other words, do they actually have the necessary logs (going back far enough in time and containing the required data) to reasonably verify this?
At what point did the threat actor identify the vulnerability in Exchange that allowed them to use forged authentication tokens signed by an MSA signing key to impersonate AAD users? Could they have somehow discovered it independently of acquiring the signing key? Might they have discovered the same vulnerability affecting other applications before Exchange became vulnerable in 2022?
Regarding the last question about how the threat actor might have discovered the issuer ID validation vulnerability in Exchange, we can posit a theory that they initially realized that the SDK (which is open source) did not include endpoint validation by default, and correctly assumed that at least some of the SDK’s users — including Microsoft developers — would therefore fail to correctly implement this validation.
It’s also worth noting that unlike Microsoft’s previous report, this latest one doesn’t include any new technical indicators related to the threat actor’s activity, even though Microsoft has found evidence of such activity within their corporate network.
This blog post was written by Wiz Research, as part of our ongoing mission to analyze threats to the cloud, build mechanisms that prevent and detect them, and fortify cloud security strategies.