Research

How we broke the cloud with two lines of code: the full story of ChaosDB

Nir Ohfeld and Sagi Tzadik
November 10, 2021
Nir Ohfeld and Sagi Tzadik
How we broke the cloud with two lines of code: the full story of ChaosDB

In August, 2021 the Wiz Research Team disclosed ChaosDB – a severe vulnerability in the popular Azure Cosmos DB database solution that allowed for complete, unrestricted access to the accounts and databases of several thousand Microsoft Azure customers, including many Fortune 500 companies. This vulnerability was so severe that we didn’t want to share the full extent of it until enough time had passed to properly mitigate it. Today, at BlackHat Europe 2021, the team shared all of the technical details behind ChaosDB for the first time. We want to provide a summary of what was discussed, and share the full extent of ChaosDB, the impact it had, and the questions it raises about security in managed cloud services.

As a quick recap of what we’ve shared to date, ChaosDB is what we named the chain of misconfigurations we found in the Cosmos DB service -- specifically, multiple flaws in the way Microsoft introduced the Jupyter Notebook feature to the Cosmos DB service. In short, the high-level workflow was local privilege escalation leading to unrestricted network access, which allowed the team to obtain a wide range of certificates and private keys that gave admin access to Cosmos DB accounts of other users. To make matters worse, Cosmos DB accounts used to come bundled with the Jupyter Notebook feature auto-enabled. This was not something made explicit to customers, so when flaws were found in Jupyter Notebook that impacted Cosmos DB, many customers were exposed without their knowledge. This, however, was not the full story of ChaosDB.

The full extent of ChaosDB: admin access to the control panel

However, what we did not disclose in August was the full extent of the ChaosDB vulnerability. Not only did it effectively allow any unprivileged attacker to obtain complete and unrestricted access to the databases of several thousand Microsoft Azure customers, but by exploiting each misconfiguration in this service, and chaining them together, we obtained an excessive amount of Microsoft’s internal Cosmos DB-related secrets and credentials. Using these secrets, we were able to authenticate, as admin, to over 100 Cosmos DB-related management panels (in the form of Service Fabric instances, which is the container orchestration solution being used behind the scenes to power the service). One of the things that we were able to do as admin in this interface was to obtain information regarding every Cosmos DB Account that is hosted in the regional cluster – including its authentication tokens!

What this means is that with only two lines of code we were able to do what was previously thought to be impossible: escape the abstraction layers of the cloud to access the underlying internal Azure infrastructure. This was more than an account takeover vulnerability; in the wrong hands it could have been a service takeover vulnerability. Besides taking over the account and manipulating data, we could also have damaged the Cosmos DB service due to the admin position we had from within it. The impact of gaining access to the underlying Service Fabric instances means that this vulnerability was nearly impossible to defend against as a customer.  

Internal Azure internet accessible Service Fabric web interface

According to our assessment, this is the first time a vulnerability of this order of magnitude has ever been disclosed to any cloud service provider. Although this vulnerability affects a managed service (where the cloud provider should manage security issues for its customers) – in this case specifically, customers do have to take manual actions to be fully mitigated – increasing the severity of this finding. You can read more about mitigating ChaosDB in our mitigations blog post.

The impact of ChaosDB

Besides affecting thousands of customers and their databases – we suspect that this vulnerability could have also affected Microsoft’s security posture as a whole. In less than a week of active research, and by using only 6 of the 25 secrets we obtained, we believe that we were able to nearly take over the entire service. This level of power was beyond what we felt comfortable exploring as researchers. We had gained the same privileges as internal Microsoft Azure employees who work on the service. And to make things even worse – our researchers also proved that once the secrets have been obtained, access to the service management panels could be maintained over the internet, without accessing the vulnerable environment.

Internal Azure Service Fabric application

While this vulnerability is a completely new level of exposure, there are positives here. On the bright side, Microsoft’s Security Team sprang into action and mitigated this vulnerability on their end in less than 48 hours from when they received our report, showing a commendable level of care for all impacted customers. From our perspective, it’s fortunate that something like ChaosDB happened in a cloud service, rather than an on-premises service. As a result, Microsoft’s Security Team was able to manage the patching and mitigate the issue so quickly. Doing the same in an on-premises environment would have required much more time and effort on the customer’s part, leaving the environment exposed for a much longer period.  

A new milestone in cloud security

In our opinion, this vulnerability serves as a milestone in the cloud security industry and there are a lot of things to be learned from this incident, especially regarding isolation in the cloud. As one of the first vulnerabilities to break the cloud isolation model that the public cloud relies on, this is a serious incident that merits deep understanding. Because isolation was broken here, there was no defense against this from the customers’ side. There were no configuration choices or security policy decisions that made a customer vulnerable; everyone was vulnerable by default. It broke all security models. This is why we wanted to dedicate a full session at BlackHat to dive into all the technical details and learnings behind ChaosDB. We believe it’s an important event for security, and wanted to share everything we did and learned with the community.  

You can watch our BlackHat session virtually to learn all the details. If you want to read the full technical deep dive on all the bits and bytes behind this vulnerability, check out the technical blog post we’ve written that pulls back the curtain on everything in detail.

Suggested article