The Wiz Research Team recently found four critical vulnerabilities in OMI, which is one of Azure's most ubiquitous yet least known software agents and is deployed on a large portion of Linux VMs in Azure. The vulnerabilities are very easy to exploit, allowing attackers to remotely execute arbitrary code within the network with a single request and escalate to root privileges.
- CVE-2021-38647 – Unauthenticated RCE as root
- CVE-2021-38648 – Privilege Escalation vulnerability
- CVE-2021-38645 – Privilege Escalation vulnerability
- CVE-2021-38649 – Privilege Escalation vulnerability
Many different services in Azure are affected, including Azure Log Analytics, Azure Diagnostics and Azure Security Center, as Microsoft uses OMI extensively behind the scenes as a common component for many of its management services for VMs. In a survey, Wiz found that over 65% of sampled Azure customers were exposed to these vulnerabilities and unknowingly at-risk. Although widely used, OMI’s functions within Azure VMs are almost completely undocumented and there are no clear guidelines for customers regarding how to check and/or upgrade existing OMI versions. For a high-level overview of the vulnerability and updates regarding mitigations, visit our OMIGOD blog. For our guidance on identifying and remediating OMIGOD in your environment, download our checklist.
In this post we describe the full technical details of the vulnerabilities we found with the following sections:
What is OMI
OMI is the UNIX/Linux equivalent of Window’s WMI. It allows users to manage configurations across remote and local environments and collect statistics. Due to the ease of use and abstraction that OMI provides, it is used extensively in Azure, particularly inside Open Management Suite (OMS), Azure Insights, Azure Automation and many more.
The OMI agent is automatically deployed on Azure VMs as part of the onboarding process of the above services. However, there is no clear documentation in Azure on deployment, monitoring and updating of OMI.
Additionally, OMI agents are frequently used on-premises for management of Linux machines. For example, OMI is built in to Microsoft System Center for Linux, Microsoft’s server management solution.
OMI’s functionality can be extended via providers. For example, users can query docker container information, using the appropriate docker provider or retrieve and create Unix processes using the SCX Provider.
Who is vulnerable
A majority of large organizations using Azure are impacted. Basically, any customer using one or more of the following Azure services:
- Azure Automation
- Azure Automatic Update
- Azure Operations Management Suite
- Azure Log Analytics
- Azure Configuration Management
- Azure Diagnostics
- Azure Container Insights
Note that this is only a partial list. Let us know if you are aware of more Azure services silently deploying OMI.
Why the OMI Attack Surface is interesting to attackers
The OMI agent runs as root with high privileges. Any user can communicate with it using a UNIX socket or sometimes using an HTTP API when configured to allow external usage. As a result, OMI represents a possible attack surface where a vulnerability allows external users or low privileged users to remotely execute code on target machines or escalate privileges.
Some Azure products, such as Configuration Management, expose an HTTPS port for interacting with OMI (port 5986 also known as WinRM port). This configuration enables the RCE vulnerability (CVE-2021-38647). It’s important to mention that most Azure services that use OMI deploy it without exposing the HTTPS port.
Note that in the scenarios where the OMI ports (5986/5985/1270) are accessible to the internet to allow for remote management, this vulnerability can be also used by attackers to obtain initial access to a target Azure environment and then move laterally within it. Thus, an exposed HTTPS port is a holy grail for malicious attackers. As depicted in the diagram below, with one simple exploit they can get access to new targets, execute commands at the highest privileges and possibly spread to new target machines.
The other three vulnerabilities are classified as privilege escalation vulnerabilities, and they can enable attackers to gain the highest privileges on a machine with OMI installed. Attackers often use such vulnerabilities as part of sophisticated attack chains, after gaining initial low privileged access to their targets.
CVE-2021-38647 - Remote Code Execution - Remove the Authentication header and you are root
This is a textbook RCE vulnerability, straight from the 90’s but happening in 2021 and affecting millions of endpoints. With a single packet, an attacker can become root on a remote machine by simply removing the authentication header. How can it be so simple?
Thanks to the combination of a simple conditional statement coding mistake and an uninitialized authentication struct, any request without an Authorization header has its privileges default to uid=0, gid=0, which is root. O-MI-GOD!
This vulnerability allows for remote takeover when OMI exposes the HTTPS management port externally (5986/5985/1270). This is in fact the default configuration when installed standalone and in Azure Configuration Management or System Center Operations Manager (SCOM). Fortunately, other Azure services (such as Log Analytics) do not expose this port and thus the scope is limited to local privilege escalation.
The diagram below illustrates the unexpected behavior of OMI when a command execution request is issued with no Authorization header.
- Normal flow with valid password in the Authentication header - The omicli issues an HTTP request to the remote OMI instance, passing the login information in the Authorization header.
- Authorization failure when passing an invalid Authentication header - As expected, if omicli passes an invalid header it fails.
- Exploit flow when passing a command without Authentication header - The OMI server trusts the request even without an Authentication header and enables the perfect RCE: single-request-to-rule-them-all.
Here is the most minimal patch needed: from the OMI GitHub repo, simply initialize to an invalid value…
Another disturbing issue we found was that this commit has been available in the OMI GitHub repo for anyone to see for over a month! This means that threat actors could have started exploiting these vulnerabilities over a month ago without any prior customer notifications.
CVE-2021-38648 - Local Privilege Escalation Overview
The following vulnerability affects all installations of OMI prior to version 1.6.8-1. This vulnerability is a Local Privilege Escalation and is remarkably similar to the above Remote Command Execution (CVE-2021-38647). The exploitation process is similar as well: record a legitimate command execution request from the omicli, omit the authentication part and reissue the command execution request. The command will be executed as root, regardless of the current user permissions. This might sound like the same vulnerability as the Remote Command Execution, but the root cause analysis shows that it’s an entirely different flaw.
OMI has a frontend-backend architecture. The user doesn’t communicate directly with the omiserver. Instead. the server runs as root while a lower privileged frontend process called omiengine runs as omi user.
The only way for a low privileged user to communicate with omiserver is through its frontend process omiengine.
This architecture makes it particularly challenging for the omiserver to identify the user communicating on the other side of the communication. The omiserver must trust the omiengine on the identity of the user. Therefore, each message the omiengine forwards to the omiserver is accompanied with the AuthInfo struct, which contains the user’s uid and gid.
As mentioned in the RCE vulnerability overview, the AuthInfo struct is initialized with both uid and gid equal to zero, the uid and gid of the root user. As a result, if an attacker manages to issue a request that is forwarded to the omiserver before any authentication process takes place, the request will be processed by the omiserver as if it was issued by the root user.
The omiengine has a very problematic request handling logic. There is a set of message types (e.g. authentication requests) for which the omiengine requires special processing before forwarding them to the server. For requests with no special handling, the omiengine simply forwards them to the server, without any validation, alongside the AuthInfo, regardless of the client’s authentication state. For example - specific provider requests such as the SCX provider which is capable of creating arbitrary UNIX processes.
The diagram below illustrates the communication that occurs when issuing a command execution request using omicli
Messages with no special handling (such as the “execute /bin/id” request), are forwarded to server. This means that if we issue the command execution request ourselves, without relying on omicli, the new process will be spawned under the default privileges inside the AuthInfo struct, which are uid=0, gid=0 – root privileges!
All an attacker has to do in order to exploit this vulnerability is to intercept the communication between the omicli and the omiengine, omit the authentication handshake and the command will be executed as root.
Key Takeaways – The Risks of “Secret” Agents
Even though we researched a small part of Open Management Infrastructure, we managed to find several high/critical severity vulnerabilities affecting multiple Azure products. The ease of exploitation and the simplicity of the vulnerabilities makes you wonder if the OMI project is mature enough to be used so widely within Azure.
OMI is an example of pre-installed software agents that cloud providers build into VMs running in their cloud. Problematically, this “secret” agent is both widely used (because it is open source) and completely invisible to customers as its usage within Azure is completely undocumented.
There is no easy way for customers to know which of their VMs are running OMI, since Azure doesn’t mention OMI anywhere on the Azure Portal, which impairs customers’ risk assessment capabilities. This issue highlights a gap in the famous shared responsibility model. An agent that is under the cloud provider’s responsibility can easily be used by attackers to gain high privileges remotely on their target, and the true tragedy is that customers can’t even know whether they are open to this attack.
Furthermore, it’s unclear who is responsible for patching vulnerabilities like this. Is it the user who isn’t aware the agents exist? Is it the cloud provider that shouldn’t have admin rights on the machine?
We hope to raise awareness of the risks that come with “secret" agents running with high privileges in cloud environments, particularly among Azure customers who are currently at risk until they update to the latest version of OMI. We urge the research community to continue to audit the Open Management Infrastructure to ensure Azure users stay safe.
To learn more about identifying and remediating OMIGOD, with step-by-step guidance, download our checklist.
Key Takeaways – Microsoft’s Patch Process in The OMI Repository – Irresponsible Disclosure?
Anyone who is tracking OMI’s GitHub commit logs would notice that a strange “Enhanced Security” commit was introduced on August 12th 2021. By doing a trivial patch-diff, a determined attacker could have developed an exploit for these vulnerabilities. This is especially concerning as Microsoft’s official patch (v1.6.8-1) was only released on September 8th 2021, leaving affected users with nothing they could do to prevent exploitation for almost a month after giving attackers a “silent” hint about the bugs.
June 01, 2021 - Wiz Research Team reported all 4 OMI vulnerabilities to MSRC.
July 12, 2021 - MSRC Confirmed one of the local privilege escalation vulnerabilities (CVE-2021-38648).
July 16, 2021 - MSRC Confirmed one of the local privilege escalation vulnerabilities (CVE-2021-38645).
July 16, 2021 - MSRC Confirmed the remote command execution vulnerability (CVE-2021-38647).
July 23, 2021 - MSRC Confirmed one of the local privilege escalation vulnerabilities (CVE-2021-38649).
August 12, 2021 - Wiz Research Team observed an “Enhanced Security” commit fixing all 4 reported vulnerabilities.
September 8, 2021 – Official patch released.
September 14, 2021 - All 4 vulnerabilities published on September’s Patch Tuesday.
APPENDIX: Full Technical Description
CVE-2021-38647- Unauthenticated Remote Command Execution
First let’s examine a legitimate example of remote OMI usage. We will execute the following command:
And the following output will be displayed:
Seems straightforward. Any user, in our case azureuser, can execute an arbitrary command which will be executed with the user’s privileges, provided the correct password is supplied. By using Burp Suite and examining the traffic, we can see the protocol is very basic:
The user’s supplied credentials are passed in the Authorization header, using Basic authentication (1). The user’s command is passed inside the SOAP/XML body (2). This is the response for the request above:
If we try passing the wrong credentials inside the Authorization header
we receive a 401 response as expected:
What would you expect to happen if we issued the same HTTP request without the Authorization header? We would expect to receive the same 401 Unauthorized response, similar to the one we got when we supplied bogus credentials.
We definitely did not expect to receive the following response:
The command executes! On top of that, it executes with root privileges! As we previously mentioned, we think that this is some extremely unexpected behavior. Let's understand the root cause of this bug by inspecting the source code:
The important part of the snippet above is that the h->authFailed field is initialized to FALSE (1). Another important function is _ReadData, which also handles part of the authentication. This is the function that contains the critical logical bug:
Can you spot the bug? Let’s think about how the function processes our request when we do not supply the Authorization header. The first condition (1) evaluates to false, and we end up inside the else statement, where the second condition (2) also evaluates to false (as we didn’t initiate any authentication procedure, therefore the authFailed field is set to false). We then continue to the Process_Authorized_Message function, which handles our request as an authenticated one. But with what permissions? Because the entire struct was previously memset’ed to 0, the AuthInfo struct contains uid=0, gid=0, meaning our request will be handled as if we were authenticated as root!
More Architecture Details
To understand the next two vulnerabilities, we need to have a closer look at OMI’s architecture. OMI has a frontend-backend architecture. The user doesn’t communicate directly with the omiserver. Instead of the server which runs as root, has a lower privileged frontend process called omiengine that runs as omi user. The only way to communicate with omiserver is through the UNIX sockets found in the /etc/opt/omi/conf/sockets/ directory, which is only accessible to the omi user, meaning that only processes under the omi user can communicate with omiserver. Any local user can communicate with the omiengine through the /var/opt/omi/run/omiserver.sock UNIX socket, which has full RWX permissions.
This architecture makes it particularly challenging for the omiserver to identify the user communicating on the other side of the UNIX socket. The omiserver must trust the omiengine on the identity of the user on the other end of the UNIX socket.
To illustrate, here is a diagram of the communication that occurs when a user uses omi to execute the /bin/id binary:
Which yields the following output:
When no user credentials are provided, omi preforms implicit authentication as the user on the other side of the UNIX socket.
CVE-2021-38648 - Local Privilege Escalation
Each connection between the omicli and omiengine is defined in a ProtocolSocket struct, here’s the underlying structure, omitting irrelevant fields.
One of the most important fields that is worth keeping in mind is the authInfo field, of type AuthInfo, which has the following definition:
When a user establishes a new connection to the omiengine through the /var/opt/omi/run/omiserver.sock a new ProtocolSocket is allocated, specifically, callocated. This means that all the fields are initialized to 0, including the connected user’s uid and gid.
After the connection is initialized, each user message in handled by the _ProcessReceivedMessage function.
You can view the _ProcessReceivedMessage as a switch statement acting on the msg->tag field, where the default case is to forward the message directly to the server, regardless of the user’s authentication state.
The authentication messages fall under the BinProtocolNotificationTag clause, while the command execution request itself doesn’t match any of the if-else clauses and is handled by the default procedure, so the message will be forwarded to the server, regardless of the user authentication state. That’s some interesting behavior, because the omiserver trusts the omiengine to handle the user’s authentication state and identity. Let’s think about what will happen if the user doesn’t perform the authentication negotiation before sending the execute command request: instead, once the user connects to the omiengine, he immediately issues the execute command request. As mentioned before, the message will be forwarded to the server. The omiserver relies on the omiengine to provide the user’s uid and gid as part of message metadata. If the user did not initiate the authentication process, the uid and gid remain untouched, and as mentioned before, the AuthInfo struct is memset'ed to 0, meaning that the uid and gid are both equal to 0, the uid and gid of the root user. The proof-of-concept of such a vulnerability is quite straight forward. We first need to record the communication between the omicli and the omiengine, omit the first authentication request, and only send the command execution request and gain root command execution.
CVE-2021-38645 - Local Privilege Escalation
As mentioned earlier, OMI has a frontend-backend architecture, meaning that the omiengine receives the authentication request from the client, omicli, issues a new authentication request to the omiserver, saves the authentication result information, such as the user’s uid and gid and forwards the response back to the user.
Look at the authentication logic inside the _ProcessReceivedMessage function:
Let’s review the logic, (1) first the omiengine saves the client’s socket in a connection hash map, using the connection number as the key. (2) Then the omiengine establishes a new connection with the omiserver, (3) and saves it in the same tracker hash map. (4) Then the authentication request is sent to the server for validation.
Now let’s look at how the same function handles the server response:
Before we dive into this code snippet, there is something that needs to be emphasized. The _ProcessReceivedMessage function processes an incoming request from the client and the server the same way, without any server validation. (1.1) The client’s socket id is fetched from the response and (1.2) fetched from the hash-map; if the socket is not found inside the hash-map, the authentication process fails. (2) Then the authentication response is parsed, and the authentication info is set accordingly. From now on, every command coming out of this client socket is executed with those binMsg->uid and binMsg->gid, then (3) the server socket is fetched from the hash-map; if it does not exist the authentication process fails.
Now let’s consider the following scenario: where malserver is a malicious client impersonating a server, which returns the authentication response before omiserver returns its response. There are a few challenges to the malserver to successfully authenticate the user as root. First, it needs to know the user’s socket id (1.2), but from our experience, it is usually <10 and can be guessed easily. If successfully guessed, the client’s authInfo->uid and authInfo->gid can be both set to 0. Next, we need to bypass the (3) check, where the omiengine checks if our malserver socket is in its tracker hash-map, which it is not. We can bypass it by issuing an authentication request from the malserver to the omiengine which will add its socket id to the hash-map, and immediately send an authentication success response for the omicli socket id with uid=0, gid=0.
The exploitation is quite complex and statistical due to a different bug (a use-after-free error that occurs in this code path) that keeps crashing the omiengine (which we’ve also reported to Microsoft), so instead of using the omicli, we created a Python script that sends the messages directly through the omiengine UNIX socket.
The exploitation flow is straightforward:
- Send an authentication request with bogus credentials
- Start another thread
- Send the id >> /tmp/win command
- Send an authentication request
- Send authentication success response with uid=0, gid=0 for the authentication request initiated in the main thread
After a certain number of iterations, the race condition will be successfully exploited and we our code will execute as root.