Threat hunting framework: A cloud security best practice guide

Equipo de expertos de Wiz
Key takeaways
  • Threat hunting frameworks provide structured, repeatable methodologies for proactively searching for hidden threats that have bypassed traditional security defenses in cloud environments.

  • These frameworks transform security from a reactive to a proactive discipline, which is essential for the dynamic and ephemeral nature of modern cloud architectures.

  • By providing a consistent process, frameworks improve the quality of threat detection, increase team efficiency, and help mature an organization's overall security posture against sophisticated attackers.

Why cloud environments need specialized threat hunting frameworks

A threat hunting framework is a structured methodology that guides security teams to proactively search for hidden threats. It provides four core components: hypothesis formation (what to hunt for), data source requirements (which logs to collect), investigation procedures (how to execute hunts), and success metrics (how to measure effectiveness). This transforms hunting from reactive alert-chasing into proactive threat discovery—you systematically search for signs that attackers have bypassed your defenses.

Traditional threat hunting was built for static data centers. Cloud environments work differently. Resources like containers and serverless functions appear and disappear in minutes, providing limited telemetry that makes it harder to reconstruct activity after they terminate.

The cloud's interconnected nature creates unique attack surfaces. Attackers exploit misconfigured services, overly permissive accounts, or vulnerable APIs to move through your environment. You need specialized frameworks that understand how AWS, Azure, and GCP work.

Cloud-specific challenges include:

  • Ephemeral resources: Containers and functions that exist for minutes, not months

  • API-driven attacks: Threats targeting cloud management interfaces instead of network protocols

  • Identity sprawl: Hundreds of service accounts and roles that can be compromised

  • Serverless blind spots: Functions with limited telemetry; hunters rely on cloud-native logs (AWS CloudWatch, Azure Monitor) and distributed traces (AWS X-Ray, Azure Application Insights)

A Practical Guide to Cloud Threat Detection

Go beyond theory with this practical guide to detecting, investigating, and responding to threats in complex cloud environments.

Essential threat hunting frameworks for cloud security teams

Several proven frameworks can structure your threat hunting efforts. Each provides a different way to organize and execute your hunts based on real-world security operations.

MITRE ATT&CK

MITRE ATT&CK is a knowledge base that maps out how attackers actually operate in the real world. This means you can use it to understand what techniques adversaries use and build hunts to find those specific behaviors.

MITRE ATT&CK for Cloud is particularly valuable for cloud security teams. It maps 14 tactics (Initial Access, Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement, Collection, Exfiltration, and Impact) with over 100 cloud-specific techniques. For example, technique T1078.004 (Valid Accounts: Cloud Accounts) helps hunters search for compromised IAM credentials, while T1530 (Data from Cloud Storage) guides hunts for unauthorized S3 or Blob Storage access.

PEAK Framework

PEAK stands for Prepare, Execute, and Act with Knowledge. This framework gives you three simple phases to organize your hunting program.

The Prepare phase involves defining what you're hunting for and gathering the right data sources. Execute means actually running your hunt and testing your theories. Act with Knowledge means taking what you learned and using it to improve your defenses.

Open Threat Hunting Framework

OTHF is a community-driven project that helps teams document and share their hunting methods. This means you can learn from other organizations' experiences and contribute your own findings back to the community.

The framework provides templates for documenting hunts in a standardized way. This makes it easier for teams to collaborate and build on each other's work.

TaHiTI Framework

TaHiTI focuses on using threat intelligence to guide your hunts. Instead of generic searches, you target specific threats that are relevant to your organization and industry.

This framework helps you move beyond random hunting to focused investigations based on actual threat actor behavior. You use intelligence feeds and research to form specific hypotheses about what might be happening in your environment.

Implementing hypothesis-driven threat hunting in cloud environments

Hypothesis-driven hunting means starting with a specific theory about what threats might exist in your environment. This approach makes your hunting more focused and effective than random searching.

A good hypothesis is specific, testable, and mapped to known attack techniques. Here are five example hypotheses with ATT&CK mappings and starter queries:

Hypothesis 1: Credential Access via Unusual AssumeRole

  • ATT&CK Technique: T1550.001 (Use Alternate Authentication Material: Application Access Token)

  • Theory: An attacker is using stolen credentials to assume IAM roles from unusual locations

  • Starter query: Search CloudTrail for sts:AssumeRole calls from new ASNs or countries not in your baseline

Hypothesis 2: Persistence via IAM Policy Manipulation

  • ATT&CK Technique: T1098.001 (Account Manipulation: Additional Cloud Credentials)

  • Theory: An attacker is attaching overly permissive policies to maintain access

  • Starter query: Find iam:AttachUserPolicy or iam:PutUserPolicy events with AdministratorAccess or PowerUserAccess policies

Hypothesis 3: Exfiltration from Public Storage

  • ATT&CK Technique: T1530 (Data from Cloud Storage Object)

  • Theory: An attacker discovered a public S3 bucket and is downloading sensitive data

  • Starter query: Analyze S3 access logs for GetObject requests from external IPs with high request volumes

Hypothesis 4: Lateral Movement via Kubernetes API

  • ATT&CK Technique: T1078.004 (Valid Accounts: Cloud Accounts)

  • Theory: An attacker compromised a service account and is using it to access the Kubernetes API

  • Starter query: Search EKS audit logs for kubectl commands from non-control-plane IPs or unusual service accounts

Hypothesis 5: Defense Evasion via Lambda Backdoor

  • ATT&CK Technique: T1562.001 (Impair Defenses: Disable or Modify Tools)

  • Theory: An attacker deployed a malicious Lambda function to disable security monitoring

  • Starter query: Find lambda:CreateFunction or lambda:UpdateFunctionCode events followed by cloudtrail:StopLogging or guardduty:DeleteDetector calls

Organizations conducting hypothesis-driven hunts during the Log4j vulnerability crisis formed specific theories like 'vulnerable Log4j versions exist in our container images' or 'attackers are exploiting Log4j in our internet-facing applications.' They then collected evidence from container registries, application logs, and network traffic to validate or disprove each hypothesis within hours.

The process starts with forming your hypothesis based on threat intelligence, known attack patterns, or anomalies you've observed. Next, you collect the data needed to test your theory. In cloud environments, this includes logs from your cloud provider, network traffic data, and information from your workloads.

Data collection in the cloud requires enabling specific telemetry sources per provider:

AWS telemetry checklist:

  • CloudTrail management events (API calls to AWS services) and data events (S3 object access, Lambda invocations)

  • VPC Flow Logs (network traffic metadata between resources)

  • GuardDuty findings (threat intelligence-based detections)

  • CloudWatch Logs from applications and Lambda functions

  • EKS control plane audit logs (Kubernetes API server activity)

  • Config snapshots (resource configuration history)

Azure telemetry checklist:

  • Activity Logs (control plane operations on Azure resources)

  • Entra ID Sign-in and Audit Logs (authentication and authorization events)

  • NSG Flow Logs (network security group traffic)

  • Azure Monitor Diagnostic Settings (resource-level logs and metrics)

  • AKS audit logs (Kubernetes API activity)

  • Microsoft Defender for Cloud alerts

GCP telemetry checklist:

  • Cloud Audit Logs: Admin Activity (configuration changes), Data Access (data reads/writes), System Events (automated actions)

  • VPC Flow Logs (network connections between resources)

  • Cloud Logging (application and system logs)

  • GKE audit logs (Kubernetes control plane activity)

  • Security Command Center findings (vulnerability and threat detections)

Finally, you analyze the collected data to prove or disprove your hypothesis. If you find evidence of the threat, you move into incident response mode. Document your findings with timestamps, affected resources, and initial scope assessment. Activate your incident response playbook to contain the threat, preserve forensic evidence, and begin remediation. If not, you document what you learned and use it to refine future hunts.

Leveraging behavioral analytics for cloud threat detection

Behavioral analytics focuses on identifying unusual patterns rather than looking for known bad signatures. This means you establish what normal looks like in your environment, then hunt for deviations that could indicate threats.

Start by defining baseline behavior for your cloud environment. This includes typical patterns for how users access resources, when workloads communicate with each other, and how APIs are normally used. For example, your development team might only access certain databases during business hours from specific locations.

Once you have a baseline, you can spot anomalies that warrant investigation. An alert might trigger if a user account suddenly accesses resources it has never touched before, or if a workload starts making unusual network connections to external sites.

Key behavioral indicators to monitor include:

  • User access patterns: Login times, locations, and resource usage

  • API call frequency: Unusual spikes in cloud service usage

  • Network traffic: New connections or data transfer volumes

  • Resource provisioning: Unexpected creation of compute or storage resources

Machine learning models can automate this process by analyzing large amounts of cloud data to learn normal behavior. These algorithms can automatically flag statistically significant deviations, allowing you to scale your hunting efforts beyond what manual analysis could achieve.

Augment cloud provider logs with lightweight runtime telemetry to improve anomaly detection fidelity. For example, cloud logs show that a Lambda function was invoked, but runtime telemetry reveals the specific processes executed, network connections established, and files accessed during that invocation. This additional context reduces false positives—an unusual API call might be legitimate if runtime data shows it's part of a known application workflow. Deploy lightweight eBPF-based sensors on Linux hosts and Kubernetes nodes to capture process, network, and file integrity events without significant performance overhead.

Building threat intelligence into your cloud hunting strategy

Threat intelligence provides context about who might be attacking you and how they operate. This information transforms generic hunting into targeted investigations focused on real adversary behavior.

Integrate external threat intelligence feeds with your internal cloud data. These feeds provide indicators of compromise like malicious IP addresses, domain names, and file hashes. Map these known threats to your cloud-specific assets like container images or Lambda function code.

Beyond external feeds, create custom intelligence based on your organization's unique cloud setup. Analyze past incidents in your environment, monitor security research for new cloud attack techniques, and understand which assets are most valuable to protect. Security teams align their detections with MITRE ATT&CK for Cloud to systematically improve coverage. For example, mapping existing detection rules to ATT&CK techniques reveals gaps—if you have strong coverage for Initial Access (T1078, T1190) but weak coverage for Persistence (T1098, T1136), you know where to focus new detection development and threat hunting efforts.

Custom intelligence sources include:

  • Internal incident data: Patterns from previous breaches or attempts

  • Industry-specific threats: Attacks targeting your sector or business model

  • Cloud configuration risks: Misconfigurations common in your environment

  • Third-party dependencies: Vulnerabilities in services you rely on

This internal intelligence helps you develop highly relevant hunting hypotheses. Instead of generic searches, you can focus on threats that are most likely to target your specific cloud architecture and business model.

Automating threat hunting across multi-cloud infrastructures

Manual threat hunting becomes significantly more difficult when operating across AWS, Azure, and GCP. Each cloud provider uses different logging formats (AWS CloudTrail JSON vs. Azure Activity Log schema), authentication models (AWS IAM vs. Azure Entra ID vs. GCP IAM), and API structures. Hunters must normalize this data into a unified format and correlate events across providers to detect multi-cloud attack patterns.

Implement automated data collection to centralize information from all your cloud accounts. This creates a unified view where you can run searches across AWS, Azure, and GCP data without switching between different consoles. Unified visibility is essential for effective multi-cloud threat hunting.

Normalize resources and risks across clouds to run consistent hunts and policies without per-cloud rule drift. For example, define a single policy for 'internet-exposed compute with high-severity vulnerabilities' that automatically translates to AWS (EC2 with public IPs and CVEs), Azure (VMs with public endpoints and vulnerabilities), and GCP (Compute Engine instances with external IPs and security findings). This normalization prevents the common problem where security teams write separate rules for each cloud, leading to coverage gaps and inconsistent risk assessment.

Use Infrastructure as Code scanning to automatically detect misconfigurations before they reach production. Integrate security checks into your CI/CD pipeline to scan Terraform or CloudFormation templates for insecure settings. This prevents entire classes of vulnerabilities from being deployed.

Automation opportunities include:

  • Log aggregation: Centralized collection from all cloud providers

  • Configuration scanning: Automated checks of infrastructure code

  • Playbook execution: Automated response to common threat scenarios

  • Alert correlation: Connecting related events across different clouds

Security orchestration platforms can build automated playbooks for common cloud threat scenarios. A playbook might automatically quarantine a compromised resource, gather forensic data, and notify your security team when a high-confidence alert triggers.

Hunting for threats in containerized and serverless environments

Containers and serverless functions create new security challenges that traditional hunting methods can't address. These environments require specialized techniques that account for their unique characteristics.

Container threat hunting focuses on image vulnerabilities, runtime anomalies, and potential container escapes. Since containers share the host operating system kernel, a successful container escape can compromise the entire node and other running workloads.

Monitor container environments for unusual process executions, unexpected network connections, and privilege escalation attempts. Since containers are designed to be immutable, any changes to the filesystem or running processes could indicate malicious activity.

Serverless functions present different challenges because the underlying infrastructure is abstracted. Focus hunts on function code and dependencies (scanning for malicious packages), IAM permissions (detecting overprivileged functions with admin rights), invocation patterns (unusual call frequencies or sources), and cloud-native telemetry. For AWS Lambda, analyze CloudWatch Logs for execution anomalies, X-Ray traces for suspicious external connections, and CloudTrail Data Events for unauthorized function modifications.

Serverless hunting targets include:

  • Overprivileged functions: Lambda functions with excessive IAM permissions

  • Unusual invocation patterns: Functions being called at unexpected times or frequencies

  • Malicious dependencies: Compromised libraries or packages in function code

  • Data exfiltration: Functions accessing and transmitting sensitive data

In Kubernetes environments, extend your hunting to the cluster control plane itself. Look for unauthorized access to the Kubernetes API server, malicious admission controllers, or compromised kubelet credentials.

Integrating threat hunting with cloud security posture management

Cloud Security Posture Management tools continuously scan for misconfigurations and compliance violations. These findings provide valuable intelligence for threat hunters by identifying potential attack vectors and entry points.

Use CSPM findings to form specific hunting hypotheses, prioritizing based on attack path analysis. Instead of investigating every misconfiguration individually, focus on toxic combinations where multiple risks create complete attack paths.

For example, use graph-based attack path analysis to identify scenarios like: publicly exposed S3 bucket (internet reachability) + contains PII data (sensitive data classification) + has a high-severity vulnerability in the application that writes to it (exploitable weakness) + the application runs with admin IAM permissions (privilege escalation potential). This complete attack path warrants immediate hunting, while an isolated public bucket without sensitive data or exploitable access methods is lower priority.

Query the security graph for these multi-hop relationships: 'Show me internet-exposed resources that can access databases containing PII through exploitable vulnerabilities or overprivileged identities.' This graph-based prioritization ensures you hunt for the threats that matter most rather than chasing every individual finding.

Misconfiguration data helps map potential attack paths through your environment. Understanding how different security gaps connect shows you how an attacker might move from initial access to their ultimate target. Organizations combine CSPM findings with threat hunting to prioritize investigations. For example, if CSPM discovers an S3 bucket with public read access and sensitive data classification, hunters immediately investigate the bucket's access logs for evidence of unauthorized downloads. They search for unusual source IPs, high-volume GetObject API calls, or access patterns that don't match legitimate business users. This targeted approach finds active exploitation of misconfigurations rather than just documenting the vulnerability.

CSPM integration benefits include:

  • Hypothesis generation: Using misconfigurations as starting points for hunts

  • Attack path mapping: Understanding how vulnerabilities connect

  • Risk prioritization: Focusing on the most dangerous combinations of issues

  • Compliance-driven hunting: Using regulatory violations as threat indicators

Compliance violations can serve as indicators for deeper investigation. A resource that violates security standards might not be an immediate threat, but it represents a weakness that attackers could exploit.

Measuring and maturing your cloud threat hunting program

Measure your threat hunting program's effectiveness to demonstrate value and drive continuous improvement. Clear metrics help you track progress, justify investments, and systematically enhance your capabilities.

Track key metrics with specific targets and data sources to measure program effectiveness:

MetricTargetData SourceInterpretation
Detections per hunt0.3–0.5 (30–50% success rate)Hunt documentation systemHigher rates may indicate reactive hunting; lower rates suggest poor hypotheses
Mean time to detect (MTTD)<24 hours for critical threatsSIEM timestamps from initial indicator to hunt confirmationMeasures how quickly hunts find threats after initial compromise
Mean time to investigate (MTTI)<4 hours per huntHunt tracking systemEfficiency metric; decreases as procedures mature
ATT&CK technique coverage60%+ of relevant cloud techniquesDetection rule mapping to ATT&CKShows breadth of hunting program
False positive rate<10% of hunt hypothesesHunt outcome documentationHigh rates indicate poor hypothesis formation
Automation percentageIncrease 10% quarterlyPlaybook execution logsTracks maturity progression toward automated hunting

Review these metrics monthly to identify improvement areas. For example, if MTTI is increasing, investigate whether hunters lack training, tools, or documented procedures.

Use a maturity model to chart your progression from basic to advanced capabilities. Most models include levels from ad-hoc manual hunts to fully automated, continuous programs. As you mature, you should see increased automation, greater use of data analytics, and tighter integration with incident response.

Maturity progression typically includes:

  • Initial: Ad-hoc, manual hunts based on alerts

  • Developing: Regular hunts with documented procedures

  • Defined: Structured program with clear metrics and goals

  • Managed: Data-driven hunts with automated components

  • Optimizing: Continuous improvement with full automation

Create feedback loops to capture knowledge from every hunt. Whether successful or not, each investigation generates valuable insights that should improve your automated detection rules, refine future hypotheses, and strengthen preventative controls.

How Wiz transforms cloud threat hunting effectiveness

Wiz transforms threat hunting from a manual, time-consuming process into an efficient, context-driven operation. The platform empowers security teams to proactively find and fix the most critical risks in their cloud environments.

Wiz Security Graph connects detections to identities, configurations, and data to surface real attack paths and the fastest fix.

Wiz Defend automates investigation workflows with the Investigation Graph, which correlates thousands of cloud events into visual attack timelines. This eliminates the manual work of piecing together disparate logs and helps analysts instantly understand how an attack unfolded.

The Wiz Security Graph enables complex queries across every layer of your cloud infrastructure. Hunt for hidden attack paths by examining relationships between identities, network configurations, vulnerabilities, and data exposure in a single, unified view.

The Wiz Runtime Sensor uses lightweight eBPF (extended Berkeley Packet Filter) technology to provide deep workload visibility with minimal performance impact. It captures process execution (including command-line arguments), network connections (source, destination, ports, protocols), and file integrity events (modifications, deletions, permission changes). This telemetry gives hunters the detailed runtime data needed to investigate suspicious activity on Linux hosts and Kubernetes nodes.

Wiz's cloud-to-code correlation traces runtime threats back to specific code and developers. This enables true root cause analysis and prevents similar threats from recurring by fixing the underlying issues.

The agentless approach provides complete visibility across your entire cloud environment in minutes. This eliminates blind spots that attackers exploit and ensures your threat hunting program operates with accurate, comprehensive data.

Request a demo to explore how Wiz can secure your cloud environment.

FAQs about threat hunting frameworks