Understanding Kubernetes security incidents
Kubernetes security incidents differ fundamentally from traditional IT breaches. Containers and pods are ephemeral—some containers live for only seconds or minutes. They're created, destroyed, and moved within seconds, making it far harder to track attacks compared to static servers.
Common Kubernetes security incidents include:
Container escapes: Attackers break out of isolated containers to access the host system
Exposed API servers: Misconfigured authentication or overly permissive RBAC enables unauthorized access and potentially cluster-wide control
Compromised service accounts: Used to move laterally and access sensitive resources
Supply chain attacks: Malicious code hidden in seemingly legitimate container images gets deployed across your infrastructure
Advanced Container Security Best Practices [Cheat Sheet]
Apply the right open-source tools and policies for your Kubernetes, Docker, or cloud-native container environments with this Cheat Sheet

Incident detection and initial assessment
Effective detection starts with proper logging and Kubernetes monitoring. This is critical since median detection time exceeds 40 minutes for production incidents. Enable audit logging on your API server and configure an audit policy defining which requests to record (stages, users, resources, verbs).
Real-time threat detection requires tools that analyze data as it arrives. Falco monitors system calls and alerts on unusual activity. Set up log aggregation to collect data from multiple sources and correlate events across your environment.
Graph-based context for faster triage: Modern detection systems correlate audit logs, runtime signals, identity permissions, and network exposure into a unified security graph. This connects related entities—linking a suspicious pod to its service account, RBAC role, accessible secrets, and external IPs. Graph-based correlation can significantly reduce false positives—often by as much as 70–80%—by distinguishing isolated anomalies from genuine attack paths.
Multi-Cloud Log Source Mapping
Component | AWS | Azure | GCP |
---|---|---|---|
Cloud API calls | CloudTrail | Activity Logs | Cloud Audit Logs |
Managed K8s control plane | EKS control plane logs | AKS diagnostics logs | GKE audit logs |
Network flows | VPC Flow Logs | NSG Flow Logs | VPC Flow Logs |
Identity/IAM | CloudTrail IAM events | Azure AD logs | Cloud IAM audit logs |
Load balancer | ALB/NLB access logs | Application Gateway logs | Cloud Load Balancing logs |
DNS queries | Route 53 query logs | Azure DNS analytics | Cloud DNS logs |
Node and workload logs (provider-agnostic):
Kubelet logs: /var/log/kubelet.log or journalctl -u kubelet
Container runtime: /var/log/containerd.log or crictl logs
Application logs: kubectl logs or centralized via Fluentd/Fluent Bit
Kubernetes audit logs: /var/log/kube-apiserver-audit.log (self-managed) or managed service audit logs
Kubernetes IR First-60-Minutes Checklist:
Immediate Actions (0-15 min):
Cordon affected nodes (kubectl cordon)
Apply deny-all NetworkPolicy to compromised namespace
Capture node and volume snapshots
Collect container logs and events
Document initial indicators and timeline
Evidence Collection (15-30 min):
Export audit logs for affected timeframe
Dump process memory from running containers
Copy container writable layers
Capture network connections
Preserve pod specifications
Containment (30-45 min):
Rotate compromised service account tokens
Revoke suspicious RBAC bindings
Drain affected nodes after evidence capture
Block malicious IPs at cloud firewall level
Communication (45-60 min):
Notify incident commander and stakeholders
Update incident ticket with findings
Coordinate with cloud provider if needed
Document blast radius and affected services
RACI Matrix:
Responsible: On-call security engineer
Accountable: Security team lead
Consulted: Platform team, affected service owners
Informed: CISO, compliance team
Rapid Triage Commands:
Cluster-wide assessment:
# Recent events sorted by time kubectl get events --all-namespaces --sort-by=.lastTimestamp # All pods with node placement kubectl get pods -A -o wide # Current RBAC permissions audit kubectl auth can-i --list --as=system:serviceaccount:default:suspicious-sa
Container runtime inspection:
# List running containers crictl ps # Inspect container details crictl inspect # View container logs crictl logs
Node-level forensics:
# Active network connections ss -tunap | grep # Process tree ps auxf | grep # Recent file modifications find /var/lib/containerd -type f -mmin -60
Cloud provider snapshots:
# AWS EBS snapshot aws ec2 create-snapshot --volume-id --description "IR-evidence-$(date +%Y%m%d-%H%M)" # Azure disk snapshot az snapshot create --resource-group --source --name ir-snapshot-$(date +%s) # GCP persistent disk snapshot gcloud compute disks snapshot --snapshot-names=ir-snapshot-$(date +%s)
Take a tour of Wiz
Learn what makes Wiz the platform to enable your cloud security operation

Rapid containment and isolation strategies
Agentless visibility for rapid blast radius assessment: Before applying containment, identify all affected workloads. Agentless inventory tools can scan your cluster without requiring agents in every pod, quickly finding all workloads sharing the compromised image, namespace, or node. This complete view enables comprehensive NetworkPolicy application and cordons, preventing attacker pivots to overlooked workloads.
When you detect an incident, stop the attack from spreading. NetworkPolicies provide rapid containment when your CNI plugin supports them (Calico, Cilium, Weave Net). Apply a deny-all policy to compromised pods to isolate them—note that pods using hostNetwork bypass NetworkPolicy controls and require node-level firewall rules.
Immediately apply a deny-all NetworkPolicy to affected namespaces or pods, cutting off attacker communication. Use kubectl cordon to mark affected nodes as unschedulable, preventing new workloads on potentially compromised infrastructure.
Preserve forensic evidence before moving workloads. Use kubectl cordon to prevent new scheduling, then capture node snapshots, collect logs, dump memory, and copy container layers. Only after evidence collection should you use kubectl drain to evict workloads to clean nodes.
Kubernetes Nodes vs Pods: Key Differences Explained
Nodes are the physical or virtual machines that provide computing resources in a Kubernetes cluster, while pods are the smallest deployable units that contain one or more containers
Read moreForensic investigation in dynamic environments
Container forensics requires specialized tools for dynamic environments. For CRI-based runtimes (containerd, CRI-O), use crictl to inspect containers and read-only filesystem mounts to analyze layers without altering evidence. Tools like kube-forensics orchestrate collection across nodes, while container-diff identifies malicious modifications.
eBPF runtime telemetry for low-overhead forensics: eBPF sensors run in the kernel and capture process execution, file access, and network activity with <1% CPU overhead. Unlike traditional agents, eBPF observes system calls in real-time without modifying code. This is critical for Kubernetes forensics because containers live for seconds—eBPF captures process trees, command-line arguments, and connections before container termination.
Critical forensic steps:
Volume and node snapshots: Capture cloud volume snapshots (AWS EBS, Azure Managed Disks, GCP Persistent Disks), node root disk snapshots, and container writable layers before pod termination
Memory dumps: Preserve running processes and network connections
Log collection: Gather all relevant log files
Network analysis: Examine traffic patterns and connection attempts
Common Kubernetes Security Scenarios:
Cryptomining Detection and Response:
Indicators: High CPU usage (>80% sustained), outbound connections to mining pools, suspicious processes (xmrig, minerd)
Immediate containment:
# Block mining pool domains via NetworkPolicy apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: block-mining-egress spec: podSelector: {} policyTypes: - Egress egress: - to: - podSelector: {} ports: - protocol: TCP port: 443
Evidence collection: Process list, network connections, container image SHA, deployment source
Root cause remediation: Scan image for vulnerabilities, review RBAC, enforce resource limits, require image signing
Exposed Kubernetes Dashboard:
Indicators: Unauthenticated access, suspicious token creation, unexpected cluster-admin bindings
Immediate containment: Delete dashboard service, revoke tokens, audit RBAC changes
Evidence collection: Dashboard access logs, API audit logs, source IPs from flow logs
Root cause remediation: Redeploy with authentication, restrict to internal network, implement SSO
Compromised Service Account:
Indicators: Service account used from unexpected IPs, unusual API calls, privilege escalation attempts
Immediate containment: Delete token secrets, remove RBAC bindings, cordon nodes where SA was used
Evidence collection: Audit logs filtered by SA name, pod specifications, network flow logs
Root cause remediation: Implement least-privilege RBAC, enable workload identity, rotate tokens
Actionable Kubernetes Security Best Practices [Cheat Sheet]
Learn how to apply advanced Kubernetes security techniques across clusters, workloads, and infrastructure. Strengthen data, identity, and network protection using practical, real-world configurations.

Advanced threat hunting and analysis
Proactive threat hunting involves actively searching for missed compromise signs. Regularly analyze Kubernetes audit logs for suspicious API calls, particularly from service accounts behaving unusually.
Look for service accounts suddenly creating resources they don't normally need, like new roles or secrets. Anonymous access attempts signal potential reconnaissance or exploitation. Disable anonymous authentication (--anonymous-auth=false) unless required for health checks, enforce least-privilege RBAC for system:anonymous, and investigate source IPs and timing in audit logs. Unusual authentication patterns—like logins from unexpected locations or odd times—often indicate compromised credentials.
Behavioral analysis helps establish baselines for normal activity and spot attack indicators. Monitor resource usage patterns, network flows, and user access behaviors across your cluster.
Cross-cluster incident coordination
Multiple Kubernetes clusters create unique coordination challenges. Without centralized visibility, security teams waste time switching between tools. Establish unified logging by assigning unique cluster IDs, using consistent labels (environment, team, service), and centralizing logs into a SIEM or SOAR platform. This enables cross-cluster correlation—tracking compromised service accounts across dev and prod clusters.
Consistent security policies across all environments are essential. Development, staging, and production clusters should use the same security controls and response procedures, making automation easier and reducing configuration errors during incidents.
Establish clear escalation paths and ensure all team members can access centralized incident management tools.
Automated response and orchestration
Manual incident response is too slow for cloud-native environments where attacks spread in seconds. Admission controllers act as API server gatekeepers, validating workloads before deployment. Pod Security Admission (PSA) enforces three security profiles (privileged, baseline, restricted) at the namespace level. Third-party controllers like OPA Gatekeeper and Kyverno add custom policy enforcement.
Policy-as-Code tools like OPA and Kyverno integrate with admission controllers to enforce custom security rules. They automatically block containers with excessive privileges or prevent unapproved image deployment. GitOps practices enable automated remediation workflows that revert malicious changes to their last secure state.
Code-to-cloud traceability for root cause remediation: Automated response systems should trace runtime incidents to their origin—the container image, IaC template, Git repository, and CI/CD pipeline that deployed the vulnerable workload. When you detect a container with excessive privileges, the system identifies the Helm chart or Terraform module that created it, the Git commit introducing the misconfiguration, and the owning team. This enables remediation tickets with full context, source template fixes, and prevention of future recurrence.
Key automation capabilities:
Automatic policy enforcement: Block risky deployments before production
Incident escalation: Route alerts to appropriate teams based on severity
Evidence collection: Automatically gather logs and snapshots
Rollback procedures: Quickly revert to known-good configurations
Recovery and post-incident activities
Recovery begins after containing the threat and eliminating attacker access. Root cause analysis is essential for understanding how the breach occurred and preventing similar incidents. Examine configuration drift between your intended infrastructure state and what was actually running.
Deployment history analysis identifies when vulnerabilities were introduced and how they went undetected. This information improves security controls and detection capabilities. Post-incident reviews should involve all relevant teams to document and share lessons learned.
The recovery process includes updating security policies, improving detection rules, and strengthening failed controls. Conduct tabletop exercises to test updated procedures and ensure team members understand their roles.
What Are Kubernetes Secrets? Uses, Types, and How to Create
A Kubernetes secret is an object in the Kubernetes ecosystem that contains sensitive information (think keys, passwords, and tokens)
Read moreBuilding a proactive Kubernetes security program
Proactive security prevents incidents rather than just responding to them. Continuous vulnerability scanning and configuration assessments, combined with least privilege enforcement (dropping unnecessary Linux capabilities), image signing verification (cosign, Notary v2), and SBOM attestation help prevent exploitation—critical since only 21% disable insecure Linux capabilities. This shift-left approach catches problems early when they're easier and cheaper to fix.
Baseline Security Policies for Incident Prevention:
Pod Security Admission (namespace-level):
apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted
The "restricted" profile blocks privileged containers, host namespaces, and insecure capabilities—preventing 80% of common container escapes.
Default-deny NetworkPolicy:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} policyTypes: - Ingress - Egress
Apply to all namespaces, then explicitly allow required traffic. This limits lateral movement during incidents.
Image signature verification (Kyverno):
apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-signed-images spec: validationFailureAction: enforce rules: - name: verify-signature match: resources: kinds: - Pod verifyImages: - imageReferences: - "\*" attestors: - entries: - keys: publicKeys: |- -----BEGIN PUBLIC KEY----- -----END PUBLIC KEY-----
Blocks unsigned images, preventing supply chain attacks.
Required labels for ownership:
apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-labels spec: validationFailureAction: enforce rules: - name: check-labels match: resources: kinds: - Deployment - StatefulSet validate: message: "Deployments must have team and owner labels" pattern: metadata: labels: team: "?\*" owner: "?\*"
Ensures clear ownership for incident escalation and paging.
Security champions within development teams embed security practices throughout your organization. They advocate for secure coding, help colleagues understand security risks, and provide feedback about practical developer challenges.
Regular security assessments and penetration testing validate your security controls. These exercises identify defense gaps and ensure incident response procedures work under realistic conditions.
Compliance Considerations for Kubernetes IR:
SOC 2 Type II requirements:
CC6.1 (Logical Access): Document RBAC policies and service account usage
CC7.2 (System Monitoring): Implement audit logging and alerting
CC7.3 (Incident Response): Maintain documented IR procedures and evidence retention
CA1.1 (Confidentiality): Encrypt sensitive data in etcd and persistent volumes
ISO 27001 Annex A controls:
A.12.4.1 (Event Logging): Enable comprehensive audit logging across control plane and nodes
A.16.1.4 (Incident Assessment): Document incident classification and escalation procedures
A.16.1.5 (Incident Response): Maintain IR playbooks and conduct regular tabletop exercises
A.16.1.7 (Evidence Collection): Preserve forensic evidence per legal and regulatory requirements
PCI DSS (for payment processing workloads):
Requirement 10: Log all access to cardholder data environments
Requirement 10.6: Review logs daily for anomalies
Requirement 12.10: Implement and test incident response plan quarterly
HIPAA (for healthcare workloads):
§164.308(a)(6): Implement security incident procedures
§164.312(b): Maintain audit controls and logs
§164.308(a)(1)(ii)(D): Conduct regular risk assessments
Practical implementation: Map IR procedures to required controls, document evidence collection and retention policies, and conduct annual compliance audits of your Kubernetes security posture.
How Wiz transforms Kubernetes incident response
Wiz Defend provides real-time Kubernetes detection and response with high-fidelity detections curated by Wiz Research, reducing blind spots in dynamic container environments. The platform prioritizes precision over volume—detections correlate multiple signals (process execution, network connections, file access, API calls) to identify genuine threats while filtering out benign anomalies that generate false positives.
The Wiz Security Graph automatically correlates runtime threats with cloud context, showing complete attack paths from compromised containers to critical assets like admin accounts or sensitive data stores. This contextual approach enables faster incident scoping and more accurate risk assessment during active investigations.
Wiz's lightweight eBPF Runtime Sensor captures forensic evidence from ephemeral containers without performance impact. The Investigation Graph visualizes complete attack timelines and blast radius automatically, reducing mean time to investigate from hours to minutes.
The platform's code-to-cloud correlation traces runtime incidents back to vulnerable source code and Infrastructure as Code templates, enabling true root cause remediation. This capability helps development teams fix underlying issues at their source, preventing the same vulnerabilities from being reintroduced in future deployments.
Request a demo to see Kubernetes incident detection, the Investigation Graph for attack path visualization, and runtime forensics with eBPF in action.
See Wiz Container Security in Action
Learn why the fastest growing companies choose Wiz to secure containers, Kubernetes, and cloud environments from build-time to real-time.
