Lateral movement risks in the cloud and how to prevent them – Part 2: from compromised container to cloud takeover

In this second blog post, we will discuss lateral movement risks from Kubernetes to the cloud. We will explain attacker TTPs, and outline best practices for security practitioners and cloud builders to help secure their cloud environments and mitigate risk.

13 minutes read

In our first blog post in this series covering lateral movement in the cloud, we introduced lateral movement as it pertains to the cloud’s network layer, the virtual private cloud (VPC).

In this second blog post, we will analyze potential attack vectors for lateral movement between the Kubernetes and cloud domains and examine how they differ between the major CSPs depending on their default cluster configurations and their integrations with IAM/AAD identities. Finally, we will suggest best practices that organizations can adopt to significantly reduce or prevent critical lateral movement risks.

An overlooked risk

Despite documented cases of threat groups like TeamTNT using pod escape and retrieving access tokens from the Instance Metadata Service endpoint, many security practitioners are unfamiliar with common Kubernetes-to-cloud lateral movement techniques. Traditional approaches operate in silos by either addressing the Kubernetes or the cloud domain, but never considering the relationship between them.

For example, our research team at Wiz investigated the number of cloud environments that utilize managed Kubernetes clusters and found that approximately 40% of environments have at least one pod with a cleartext long-term cloud key that is stored in its container image and associated with an IAM/AAD cloud identity. As with lateral movement risks in the VPC, these numbers underscore the exploitability of many organizations’ cloud environments.

Kubernetes-to-cloud attacker TTPs

Adversaries in the cloud leverage several techniques and functionalities to conduct lateral movement attacks from managed Kubernetes clusters to the cloud. These include the Instance Metadata Service, IAM/AAD identities, long-term cloud keys, and pod escape.

1. Instance Metadata Service (IMDS)

Managed K8s services assign a pre-defined role, service account, or identity to each worker node in a cluster with the necessary permissions to enable a kubelet daemon to make calls to the CSP API to execute managerial tasks related to the cluster’s stability (e.g. autoscaling). As a result, the worker node can query the IMDS endpoint for the instance metadata typically found at the IPv4 link-local address of 169.254.169.254 to assume the pre-defined identity.

Attackers who compromise publicly exposed containers within managed K8s clusters therefore often query the IMDS endpoint to retrieve the worker nodes’ IAM or AAD identity credentials and leverage them to access cloud resources such as buckets and databases outside of the clusters. However, the blast radius of such attacks depends on the role’s configuration in each CSP:

  • EKS

    AWS requires several built-in managed policies to be attached to the worker node’s role in the EKS cluster. These three policies are AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, and either AmazonEKS_CNI_Policy or a custom IPv6. The default permissions associated with each policy generate individual attack vectors.

    AmazonEKSWorkerNodePolicy: This allows worker nodes to describe EC2 resources required for the node to join the cluster. Adversaries could abuse the information to map the entire cloud network by listing all the EC2 instances in the account and retrieving sensitive data such as security groups, AMIs, IP addresses, VPCs, and any associated subnets or route tables.

# AmazonEKSWorkerNodePolicy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVolumes",
                "ec2:DescribeVolumesModifications",
                "ec2:DescribeVpcs",
                "eks:DescribeCluster"
            ],
            "Resource": "*"
        }
    ]
}
  • AmazonEC2ContainerRegistryReadOnly: This enables workloads full read access to container registries to facilitate image pulling. A malicious actor could exploit this to enumerate the container registries and their images, which may contain sensitive information like cloud keys and passwords. Moreover, they could call the ecr:DescribeImageScanFindings API to identify critical vulnerabilities in images running on insecure containers.

# AmazonEC2ContainerRegistryReadOnly
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:DescribeImages",
                "ecr:BatchGetImage",
                "ecr:GetLifecyclePolicy",
                "ecr:GetLifecyclePolicyPreview",
                "ecr:ListTagsForResource",
                "ecr:DescribeImageScanFindings"
            ],
            "Resource": "*"
        }
    ]
}
  • AmazonEKS_CNI_Policy: This provides the VPC CNI (amazon-vpc-cni-k8s) and its permissions to modify the worker node IP address configuration. An attacker could utilize the policy to list active EC2 instances and remove their network interface, laying the foundation for a DoS attack targeting the account’s clusters and instances.

# AmazonEKS_CNI_Policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AssignPrivateIpAddresses",
                "ec2:AttachNetworkInterface",
                "ec2:CreateNetworkInterface",
                "ec2:DeleteNetworkInterface",
                "ec2:DescribeInstances",
                "ec2:DescribeTags",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeInstanceTypes",
                "ec2:DetachNetworkInterface",
                "ec2:ModifyNetworkInterfaceAttribute",
                "ec2:UnassignPrivateIpAddresses"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": [
                "arn:aws:ec2:*:*:network-interface/*"
            ]
        }
    ]
}
  • GKE

    GCP automatically attaches to new worker nodes the Compute Engine default service account with the overly permissive IAM basic editor role. This role can be leveraged by an attacker to access secrets in the project, manipulate service accounts, and even delete compute instances running clusters.

  • AKS

    When an AKS cluster is deployed, a provider-managed identity that administers cluster resources like ingress load balancers and file CSI drivers is automatically attached to the cluster’s control plane. This identity cannot be appropriated by a user or pod. The worker nodes’ Virtual Machine Scale Set (VMSS), on the other hand, can accept two identities: a system-assigned identity and a user-assigned identity named <AKS Cluster Name>-agentpool. Whereas the former is disabled by default, the latter is enabled but does not possess any roles. Therefore, even if an attacker could gain a foothold in a cluster, query the IMDS endpoint, and assume the node’s user-assigned managed identity, they wouldn’t have any permissions.

    These limitations, however, are contingent on the user’s chosen configurations. For example, enabling the system-assigned identity on the VMSS and granting it permissions would allow an attacker to assume the identity and use its privileges to access cloud resources. Alternatively, if a user assigns a role to the node’s user-managed identity, an attacker with knowledge of the identity’s ID could assume the identity and infiltrate cloud resources. 

2. IAM/AAD identities for pods

In order to mitigate the risks of lateral movement, the three major CSPs provide an alternative to IMDS endpoints in the form of IAM/AAD identities assigned to K8s pod service accounts. This feature reflects the Principle of Least Privilege as it restricts nonessential pod access to cloud resources; it is named IAM Roles for Service Accounts, Workload Identity, or Azure AD workload identity, in AWS, GCP, and Azure, respectively.

Although this is a secure way of integrating cluster resources with cloud ones, an attacker who compromises a pod with this integration could impersonate the service account’s identities and abuse its permissions to access relevant cloud resources. In fact, Wiz’s research team discovered that about 10% of cloud environments with managed K8s clusters feature at least one publicly exposed pod with exploitable critical/high vulnerabilities and a K8s service account possessing a highly privileged IAM/AAD identity.

3. Long-term cloud keys in Kubernetes secrets/pods

Long-term cloud keys (e.g. Azure Service Principal credentials) are often stored in K8s secret objects so applications running in pods can access them locally or via service accounts to execute tasks.

These cloud keys are problematic in that they have an indefinite shelf life: an adversary could use a stolen key to repeatedly impersonate its associated IAM access key, service account, or AAD Service Principal unless manually revoked. For example, a malicious actor could compromise a publicly exposed container with a service account granting full read access to a namespace’s secrets, list all the secrets, and call a K8s API to extract any sensitive data stored in them. Should one of the secrets hold powerful credentials, such as an AAD Service Principal with owner permissions, this could result in total subscription takeover.

4. Pod escape

An adversary that escapes a pod via critical misconfigurations or vulnerabilities and reaches the underlying host machine may be able to access other pods running on it. In the event the pods are linked to service accounts associated with IAM identities, the adversary could compromise and then impersonate those identities.

The impact of pod escape to the underlying host, however, can also be influenced by the RBAC permissions afforded to the kubelet. These permissions differ by cloud provider:

EKS kubelet RBAC permissions
GKE kubelet RBAC permissions
AKS kubelet RBAC permissions

Although the permissions vary, in all three CSPs the kubelet has full read access to all the cluster’s resources via Kubernetes Rest APIs (Non-Resource URL /api/*) and can therefore exfiltrate any of the cluster’s secrets, including plaintext long-term cloud keys linked to IAM/AAD identities and K8s service account tokens with high privileges.

Additionally, the AKS kubelet RBAC permissions grant write privileges on critical Kubernetes objects. This allows the kubelet to update nodes and create or delete the pods running on them, with the scope of operations only narrowed by the enabled NodeRestriction admission controller.

With a compromised node and write privileges, a malicious actor could create a new pod, attach an existing K8s service account with an AAD user-managed identity, request an AAD access token, and then assume the identity.

Recommended best practices

Here are 6 key K8s-to-cloud best practices that any organization should implement in its environment to mitigate the risk of a lateral movement attack:

1. Block the IMDS endpoint

Prevent an attacker from querying the IMDS endpoint for access tokens by adhering to the Principle of Least Privilege.

  • EKS

    The first step in restricting pods from retrieving nodes’ role access tokens is to enable IAM Roles for Service Accounts (IRSA) and to confer only the necessary permissions on each role associated with a K8s service account. However, IRSA does not curb pods’ network access to the node’s IMDS endpoint. In order to prevent the pods from reaching the endpoint, they need to run in a separate network namespace from the node instance. This can be achieved by enforcing IMDSV2, modifying the hop count (TTL) to 1 on each worker node, and disabling the hostNetwork pod attribute allowing a shared network namespace. You can do so with the following cli command on the relevant worker node:

aws ec2 modify-instance-metadata-options –instance-id <value> --http-tokens required –http-put-response-hop-limit 1
  • GKE

    If you’re using the Autopilot managed cluster deployment, blocking the IMDS endpoint is unnecessary since Autopilot utilizes Workload Identity (WI). If, instead, you’re deploying Standard GKE clusters, you should manually enable WI and assign only the required permissions to each IAM service account that is linked to a K8s service account. Attaching a service account with minimal privileges—such as the Kubernetes Engine Node Service Account role­—is crucial given GCP automatically assigns the Compute Engine default service account to worker nodes with the overly permissive IAM basic editor role.

  • AKS

    In the event your clusters require access to Azure cloud workloads, you can limit pod retrieval of node managed-identity access tokens by first enabling Azure AD workload identity. Then, grant only minimal permissions to each user-assigned managed identity linked to a K8s service account. Since the workload identity does not restrict network access to the node’s IMDS endpoint, you should also apply a Kubernetes Network Policy to block metadata access for pods running in a specific namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-metadata-access
  namespace: example
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 169.254.169.254/32

2. Adopt the Pod Security Admission (PSA)

In Kubernetes 1.25, the Pod Security Admission (PSA) officially replaced Pod Security Policies. PSA is a built-in admission controller that implements the security requirements of the Pod Security Standards (PSS). PSS defines three policies—privileged, baseline, and restricted—that range from most to least permissive. These policies are applied by the PSA to specific namespaces via modes of operation.

It is highly recommended to enforce at least the baseline policy for namespaces containing sensitive workloads in delicate environments like production. This will prevent most common privilege escalations and block pod-escape techniques within misconfigured pods with high permissions. You can apply a PSS policy by running the following kubectl command:

kubectl label –overwrite ns <namespace> pod-security.kubernetes.io/<mode>=<policy-type>

3. Implement strict K8s RBAC rules

Avoid granting the ‘reading secrets,’ ‘workload creation,’ and ‘exec into pod’ permissions to non-admin K8s subjects like service accounts. With ‘reading secrets’ privileges, an attacker that has compromised a K8s subject could list the secrets in the namespace or cluster and exfiltrate sensitive data. In the case of a K8s subject with permissions to either create new workloads or exec into pods, an attacker might be able to generate new workloads or get shells into workloads with K8s service accounts associated with IAM/AAD roles, service accounts, or managed identities. They could then obtain the temporary credentials, allowing them to assume the IAM/AAD identities and execute APIs in their name.

4. Avoid storing long-term cloud keys in K8s secrets/pods

If your Kubernetes workloads require access to cloud services, consider imposing a highly secure integration standard for managed clusters such as IAM Roles for Service Accounts (EKS), Workload Identity (GKE), or Azure AD workload identity (AKS).

5. Remediate critical vulnerabilities on publicly exposed containers 

Publicly exposed containerized applications with critical vulnerabilities may pose a significant security risk to your organization by providing adversaries an entry point to your managed K8s cluster. Make sure to continuously scan your container images and remediate any critical vulnerabilities on publicly exposed containers.

6. Curb network access

In Kubernetes, pods can freely communicate with one another by default. Breaching a pod that serves a web application can therefore enable an adversary to direct traffic to other pods in your cluster. Network policies are designed to give you control over this communication by specifying at the pod level whether ingress/egress traffic is allowed based on the second pod’s identity, namespace, and IP address range. This helps you isolate pods and consequently limit the radius of compromise.

Summary

In this second blog post, we outlined several lateral movement techniques from managed Kubernetes clusters to the cloud, including pod escape and Instance Metadata Service abuse. We also shared our research findings and ultimately suggested 6 best practices to reduce your clusters’ attack surfaces, such as implementing strict K8s RBAC rules and curbing network access.

In the next post in this series, we will examine lateral movement in the other direction, from the cloud to managed Kubernetes clusters. We will explain some prevalent attacker TTPs and list additional best practices to strengthen organizations’ environments and minimize the blast radius of potential breaches.

This blog post was written by Wiz Research, as part of our ongoing mission to analyze threats to the cloud, build mechanisms that prevent and detect them, and fortify cloud security strategies.

Continue reading

Get a personalized demo

Ready to see Wiz in action?

“Best User Experience I have ever seen, provides full visibility to cloud workloads.”
David EstlickCISO
“Wiz provides a single pane of glass to see what is going on in our cloud environments.”
Adam FletcherChief Security Officer
“We know that if Wiz identifies something as critical, it actually is.”
Greg PoniatowskiHead of Threat and Vulnerability Management