ChaosDB: How to discover your vulnerable Azure Cosmos DBs and protect them

Post updated August 30: Details on remediation for Wiz customers

Post updated August 31: A new method to detect Jupyter Notebooks on Cosmos DBs, replacing the deprecated nslookup method

As described in our initial blog post on ChaosDB, the Wiz Research team found an unprecedented critical vulnerability in Azure Cosmos DB. The vulnerability gives any Azure user full admin access (read, write, delete) to another customers Cosmos DB instances without authorization.

▶️Watch our ChaosDB: How We Hacked Databases of Thousands of Azure Customers BlackHat 2021 live talk

In this blog, the second of our blog series on ChaosDB, we will go into specific mitigation steps we recommend for all teams using Cosmos DB to perform.

The structure of the blog is as follows:

General Advisory
Short recap on the vulnerability
Discovery – which Cosmos DBs are affected?
Remediation: Short-term Immediate steps
Remediation: Longer-term recommendations
For Wiz customers: discovery & remediation steps

#ChaosDB - short recap on the vulnerability

The vulnerability in Cosmos DB was related to the Jupyter notebook feature added to Cosmos DB in 2019. As depicted in the diagram below, the attacker could manipulate the local Jupyter notebook and escalate privileges to other customer notebooks containing several customer secrets including their Cosmos DB primary key.

As we look into mitigation steps, there are two main points to establish first: who is affected and what needs to be done.

Who is affected?

The vulnerability affects only Cosmos DBs that had Jupyter notebook enabled and allowed access from external IPs. However the impact is quite significant since the Jupyter notebook feature was automatically turned on for all new Cosmos DBs after February 2021. Moreover, most of Cosmos DBs allow cross tenant access since they use firewall exceptions like “Allow traffic for Azure data centers”.

What needs to be done?

When Wiz reported the vulnerability, Microsoft’s security teams took immediate action to disable the vulnerable notebook service. However customers are still required to perform mitigation steps and regenerate their keys due to the risk that their Cosmos DB primary keys were exposed to third parties.

Discovery: which Cosmos DBs are affected?

Any Cosmos DB account that had Jupyter Notebook enabled is potentially impacted. Our goal in the discovery phase is to build out the list of affected Cosmos DBs.

ℹ️We present here an undocumented method we found to detect Cosmos DB accounts with Jupyter notebooks enabled. If you know of a documented method to perform this detection, please send as an email to research@wiz.io

As of August 30, the nslookup method previously recommended to detect Jupyter Notebooks has been deprecated

Open your Cosmos DB resource page in the Azure portal.
On the right, click Export template.
On the template tab, search for a block that starts with "type": "Microsoft.DocumentDB/ databaseAccounts/ notebookWorkspaces".
If you found this block, Jupyter Notebook was enabled on your Cosmos DB. If not, Jupyter Notebook has not been enabled on your Cosmos DB.

Based on the above you can create a list of all ComosDBs with Jupyter enabled.

Can we assume that Cosmos DBs without internet connectivity are protected from this vulnerability?

No. In theory, Cosmos DBs that are completely isolated from other tenants are not vulnerable as the attacker cannot authenticate using the key. However in practice, most Cosmos DBs, allow cross-tenant access even when using the IP firewall, since they are open to other Azure services. The only practical way to completely block cross-tenant access is to use only private endpoints but this breaks functionality of many Azure services.

Overall, Wiz recommend that you replace the keys of all affected Cosmos DBs including those which are not public. In terms of priority, internet facing Cosmos DB accounts, are at the highest risk, since they can be easily accessed by attackers who may have obtained the primary key.

ℹ️Due to the potential risk of the attack, CISA’s guidance recommends regenerating keys for all your Azure Cosmos DBs.

Remediation: Short-term Immediate steps

There are two main remediation steps:

Replace your Cosmos DBs' primary keys
Reduce network exposure of Cosmos DB accounts

Security teams should ask all DB owners to replace their primary keys then use the PowerShell script attached below to monitor the key upgrade process.

Step #1: Regenerating Cosmos DB primary keys

Replacing your keys can be complex and interrupt your rhythm of business and operational processes. For users who cannot regenerate their primary key, we offer alternative solutions next.

Key rotation consequences should be carefully evaluated before you decide to regenerate them. A Cosmos DB account has two main keys – a primary key and a secondary key. There is no difference between the two: they provide the same functionality. The vulnerability in the Jupyter Notebook has only exposed the primary key, the secondary remains secret.

Warning: When regenerating a key, it can no longer be used to access your data. If you have not provided your applications with another key, you may cause severe disfunction.

The best practices to regenerate a key are:

Update all applications in the organization to work with the alternative key (the one you don’t intend to regenerate).
Only after the alternative key was set in all your applications you may regenerate the original key.
Then, you should replace the alternative key in your applications with the new regenerated one.

There’s a lower effort workaround if you can’t follow the best practice double key rotation. You may replace the primary key with the secondary one in your applications, regenerate a new primary key but continue using the secondary as your key. This solution should be temporary and we advise to follow Microsoft’s guidelines.

For more details see the guide provided by Microsoft.

Monitoring – How can I continuously monitor the key regeneration progress across the org?

Tracking which Cosmos DBs have not yet been updated is crucial for security teams to monitor the patch process.

Note: We have not found any method to detect last key rotation time from the API, hence we built a script to detect it via the logs. If anyone knows of an API-driven method to check for key rotation please send us an email to research@wiz.io

We have built a simple set of PowerShell commands to assist. The commands below list all your Cosmos DBs whose keys have not been regenerated since ChaosDB was published.

Open Powershell cloud shell console in Azure portal and execute the script below:

$dbs = Get-AzResourceGroup | foreach {Get-AzCosmosDBAccount -ResourceGroupName $_.ResourceGroupName}
$rotatedDbs = Get-AzLog -StartTime 2021-08-26T12:30 | Where-Object {$_.OperationName.value -like 'Microsoft.DocumentDB/ databaseAccounts/ regenerateKey/ action'} | Select-Object @{N="Resource";E={$_.Authorization.Scope}} | Select -Uniq -Expand Resource
$dbs | Where-Object -FilterScript {$rotatedDbs -notcontains $_.Id} | Select-Object Id

Step #2: Limiting network access

Limiting network access is important as an additional compensating control for all Cosmos DBs. It is especially important for Cosmos DB accounts for which keys cannot be regenerated in the near future due to operational considerations.

If you choose not to replace your keys, make sure your Cosmos DBs have the most limited network access. This way, even if attackers obtain your keys, they will not be able to use them to access your data.

A Cosmos DB’s network access can be limited in multiple ways:

Set Firewall rules that specify the exact addresses that may access your Cosmos DB accounts. For additional instructions see Microsoft guide.
Secure your Cosmos DB with virtual networks to allow access only from resources within the virtual network. For additional instructions. Read Microsoft's guide for more.

If you use Private Endpoint Connections to connect to your Cosmos DB as detailed in this guide, note that additional firewall rules should be set as described above.

Regardless of the network restriction method you choose, we advise that you review the next setting. In the Firewall and virtual networks menu of the Cosmos DB, on the “Exceptions” menu, make sure that “Accept connections from within public Azure datacenters” is not selected. This will make sure your DB isn’t exposed to the IP range of Azure. Azure IP addresses should be treated as public addresses since any person, malicious actors included, can get them.

We should mention that deselecting this box can result in disruption of some Azure services that require access to Cosmos DB.

Remediation: Longer-term recommendations

Outside the scope of the immediate remediation actions, it is important to review your networking and access strategy in light of the ChaosDB vulnerability. The key question is what you can do to build your environments to be more secure and avoid these kinds of large-scale vulnerabilities in the future.

Using RBAC: getting rid of secrets

The first and most important take away is that shared secrets such as Cosmos DB primary key are insecure and should be deprecated in favor of modern authentication methods.

The fact that Cosmos DB has a primary key that can be shared across users and services, all using the same secret without any clear way to audit, monitor or revoke access is simply unacceptable from a security perspective.

The longer-term goal should be to transition from these insecure primary/secondary keys to role-based access control (RBAC), which does not require any secrets at all. Role-based access is already supported for Cosmos DB and can in fact block the primary key to allow RBAC authorization only.

Using private endpoints: minimizing cross-account exposure

Network exposure must also be reduced. Although Cosmos DB can activate an IP-based firewall, in reality, this doesn’t help block cross-account access since most services still require access. Even worse, services without static IPs require users to practically open their databases to the entire world (“Azure only” IPs is the same as saying “any attacker with an Azure tenant”).

The only valid approach to build Cosmos DBs and other services with minimal network exposure is to leverage private endpoints to the extreme and ensure that no assets is externally exposed.

Unfortunately, this approach is hard to implement as not all services actually support private endpoints. However, this should be the goal of every organization as they plan their next network design.

For Wiz Customers: Discovery & Remediation steps

Wiz customers should perform the following steps to get protected:

Analyze the exposure

Go to the Threat Center for the latest updates
Use built-in Wiz Graph queries to analyze & prioritize vulnerable Cosmos DB accounts

Manage remediation

Track the status of Cosmos DB accounts whose key wasn’t regenerated
Monitor for DBs with wide network exposure

Learn more via the Wiz Advisory for ChaosDB

Analyze the exposure – ChaosDB advisory at the Wiz Threat Center

The ChaosDB advisory contains built-in queries that retrieve all the resources that are at-risk, accompanied by detailed documentation that reviews the threat and mitigation steps.

Wiz Threat Center shows you which assets are at-risk to the most dangerous threats

For ChaosDB, Wiz queries all your vulnerable Cosmos DB instances and prioritizes DBs that are at higher risk due to risk factors such as wide internet exposure. In the image above there is one high risk finding for ChaosDB due to wide exposure. Clicking on the vulnerable Cosmos DB, to see the detailed analysis of its exposure paths.

When viewing a resource, Wiz shows you your network paths to it, to asses its effective exposure

Manage Remediation - monitor the status of Cosmos DB key regeneration

Wiz helps you manage the remediation process for Cosmos DB. Follow the steps here to tag CosmosDBs that haven’t gone through key rotation. Then use the following query to monitor vulnerable high-risk Cosmos DB accounts that haven’t rotated their keys.

Wiz controls detect toxic risk combinations based on your cloud assets query

The query shown above is composed of the following query components:

Accessible from Internet—Wiz calculates effective network exposure of your Cosmos DB (and of the rest of your cloud resources and workloads), considering all network settings: firewalls, virtual networks, private endpoints, and more. This answers the simple but essential question– are my resources exposed?
Has Jupyter Notebook—The query retrieves only the vulnerable Cosmos DBs based on a built-in implementation in the Wiz graph to flag the DBs that currently have (or had in the past) the Jupyter Notebook feature.
Key Not Rotated After ChaosDB—Wiz can assist in managing key regeneration. Due to the need to monitor whether keys were regenerated after the publication of the attack, we have launched new capabilities that label Cosmos DB resources in the Wiz portal according to their key regeneration state.

Manage Remediation - monitor the network exposure of Cosmos DBs

In addition to key rotation, it is important to monitor the configuration of Cosmos DBs and ensure they follow important configuration baselines. An overview report of Cloud Configuration Rules for your Cosmos DB instances is available in the cloud configuration view. It can be used to check whether you applied the recommended network limitation configurations discussed above.