AWS Threat Hunting Best Practices for Cloud Security Teams

Essential AWS data sources for effective threat hunting

AWS threat hunting is the practice of proactively searching for security threats in your cloud environment before they cause damage. This means you're not waiting for alerts—you're actively looking for signs of attackers who may have bypassed your automated defenses.

The foundation of effective threat hunting starts with CloudTrail, your primary audit log for all AWS API calls. CloudTrail captures who did what, when they did it, from which IP address, and what resources were affected. You need to understand both management events (changes to your resources) and data events (operations on resources like S3 objects).

VPC Flow Logs give you visibility into network traffic patterns across your Virtual Private Cloud. These logs show IP traffic to and from network interfaces, helping you spot unusual connections or data transfers. AWS Config tracks resource configuration changes over time. Pair AWS Config with CloudTrail event logs, Config Rules, or Security Hub policies to detect and alert on unauthorized changes to security groups or IAM policies.

Amazon GuardDuty provides managed threat detection across your AWS accounts and Regions. Use GuardDuty findings as starting points for deeper investigations—for example, when it detects unusual API calls, cryptocurrency mining activity, or compromised credentials. Security Hub aggregates findings from multiple security services across all your accounts, giving you a single place to start your hunting activities.

Correlate signals with unified security context

These AWS data sources provide comprehensive coverage, but analyzing them in isolation creates blind spots. An API call in CloudTrail gains meaning when you see the network path in VPC Flow Logs, the resource configuration in AWS Config, and the effective permissions from IAM. Unified, agentless security platforms build a security graph that automatically correlates identity, network, and workload signals across these data sources. This graph-based approach reduces the time spent manually stitching together evidence from multiple tools—you see the complete attack path from initial access to target resource in a single view.

Incident Response Plan Template

When your threat hunting uncovers a real incident, you need a solid response plan ready to go. Get the cloud incident response template with pre-built playbooks for AWS scenarios.

Advanced CloudTrail analysis techniques for threat detection

Hunting in CloudTrail means looking beyond successful API calls to find patterns that reveal attacker behavior. Start by searching for reconnaissance activities—attackers often use Describe* or List* API calls to map out your environment before launching an attack.

Failed API calls are goldmines for threat hunters. When you see repeated Access Denied errors for a specific user trying to access sensitive resources, you're likely watching an attacker attempt privilege escalation. Look for patterns like:

Multiple failed attempts to assume high-privilege roles
Repeated access denials to S3 buckets containing sensitive data
Failed attempts to modify IAM policies or create new access keys

Copy-and-run CloudTrail hunt queries

Use these Athena queries against your CloudTrail logs stored in S3:

Detect repeated privilege escalation attempts:

SELECT useridentity.principalid, COUNT() as failed_attempts, ARRAY_AGG(DISTINCT eventname) as attempted_actions FROM cloudtrail_logs WHERE errorcode = 'AccessDenied' AND eventname IN ('iam:PassRole', 'iam:AttachUserPolicy', 'iam:PutUserPolicy', 'sts:AssumeRole') AND eventtime > CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY useridentity.principalid HAVING COUNT() > 10 ORDER BY failed_attempts DESC

Find public security group modifications:

SELECT useridentity.principalid, eventtime, requestparameters, sourceipaddress FROM cloudtrail_logs WHERE eventname = 'AuthorizeSecurityGroupIngress' AND requestparameters LIKE '%0.0.0.0/0%' AND (requestparameters LIKE '%:22%' OR requestparameters LIKE '%:3389%') AND eventtime > CURRENT_TIMESTAMP - INTERVAL '7' DAY ORDER BY eventtime DESC

Identify off-hours console logins without MFA:

SELECT useridentity.principalid, eventtime, sourceipaddress, additionaleventdata FROM cloudtrail_logs WHERE eventname = 'ConsoleLogin' AND JSON_EXTRACT_SCALAR(additionaleventdata, '$.MFAUsed') = 'No' AND (HOUR(eventtime) < 6 OR HOUR(eventtime) > 20) AND eventtime > CURRENT_TIMESTAMP - INTERVAL '30' DAY ORDER BY eventtime DESC

For CloudTrail Lake, use these queries with the native SQL interface in the CloudTrail console instead of Athena.

Time-based analysis reveals suspicious activity that happens outside normal business hours. An API call at 3 AM from a developer account that usually only works 9-5 deserves investigation. Geographic anomalies matter too—if a user account suddenly makes API calls from a different country, you need to verify whether that's legitimate travel or a compromised credential.

Correlate CloudTrail events with IAM policy changes to detect persistence mechanisms. Attackers often create new IAM users, generate access keys, or modify role trust relationships to maintain access even after you discover their initial entry point. Graph-aware security context helps you pivot from suspicious API activity to the affected resources, reachable data stores, and effective permissions in a single view. For example, when you spot a suspicious CreateAccessKey event, immediately see which resources that key can access, what data those resources contain, and whether any network paths expose those resources to the internet—without running separate queries across multiple tools.

Identity-based threat hunting in AWS environments

Identity is your new security perimeter in AWS. Attackers target IAM users, roles, and credentials because compromising identity gives them legitimate-looking access to your resources.

Monitor for suspicious IAM activities that deviate from normal patterns. New access key creation for privileged users is a red flag, especially if it happens outside your standard provisioning process. Changes to IAM role trust relationships can allow attackers to grant access to external accounts they control.

Service accounts are prime targets because they often have broad permissions and their usage patterns are predictable. Track these patterns to spot anomalies:

Unusual role assumptions: A role being assumed by an unexpected user or from an unfamiliar IP address
Excessive permissions: Service accounts suddenly accessing resources they've never touched before
Credential exposure: Access keys appearing in public repositories or being used from multiple geographic locations simultaneously

Map identity behaviors to MITRE ATT&CK for Cloud

Connect your identity hunting findings to MITRE ATT&CK tactics to understand attacker objectives:

Discovery (TA0007) – Repeated List\* and Describe\* API calls map to Cloud Service Discovery (T1580) and Cloud Infrastructure Discovery (T1580). Attackers enumerate resources before selecting targets.
Persistence (TA0003) – CreateAccessKey, CreateUser, and AttachUserPolicy events indicate Account Manipulation (T1098) and Create Account (T1136). Attackers establish backup access methods.
Privilege Escalation (TA0004) – AttachRolePolicy, PutUserPolicy, and UpdateAssumeRolePolicy events signal Valid Accounts (T1078) abuse. Attackers elevate permissions to access sensitive resources.
Lateral Movement (TA0008) – Cross-account AssumeRole calls, especially to roles in different accounts or with elevated permissions, indicate Use Alternate Authentication Material (T1550). Attackers pivot between accounts to reach their objectives.

Document your findings using ATT&CK technique IDs to create a common language between security operations, threat intelligence, and executive stakeholders.

Build a baseline of normal identity behavior for each user and role in your environment. When you see deviations—like a service account that only reads from S3 suddenly trying to launch EC2 instances—you've found a potential compromise worth investigating. Code-to-cloud ownership context shortens mean time to resolution by automatically routing identity risks to the responsible service teams. Instead of security teams manually determining which application uses a suspicious service account, ownership metadata connects the IAM role to the specific microservice, repository, and engineering team. This enables the owning team to investigate whether the behavior is legitimate or requires immediate credential rotation.

Watch 5-minute demo

See how automated investigation workflows connect CloudTrail events to real-time threats across your AWS environment.

Watch now

Network-level threat hunting with VPC flow logs and security groups

VPC Flow Logs show you how data moves through your AWS network, revealing activity that application logs might miss. You're looking for unusual traffic patterns that indicate compromise or data exfiltration.

Hunt for connections to suspicious IP addresses, especially those on threat intelligence feeds. Port scanning activity from internal or external sources suggests reconnaissance. Large data transfers to unknown external IPs could mean an attacker is stealing your data.

Security group modifications are critical to monitor. Attackers who gain access often modify security groups to open ports for backdoor access:

New inbound rules allowing SSH or RDP from 0.0.0.0/0
Outbound rules permitting traffic to unusual ports or destinations
Changes to security groups protecting sensitive resources like databases

VPC Flow Logs hunt query for data exfiltration

Use this Athena query to detect large outbound data transfers to unfamiliar destinations:

SELECT srcaddr, dstaddr, dstport, SUM(bytes) as total_bytes, COUNT(*) as connection_count, MIN(start) as first_seen, MAX(end) as last_seen FROM vpc_flow_logs WHERE action = 'ACCEPT' AND dstaddr NOT LIKE '10.%' AND dstaddr NOT LIKE '172.16.%' AND dstaddr NOT LIKE '192.168.%' AND start > CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY srcaddr, dstaddr, dstport HAVING SUM(bytes) > 10737418240 -- 10 GB threshold ORDER BY total_bytes DESC LIMIT 100

This query identifies internal IPs sending more than 10 GB to external destinations in 24 hours. Investigate high-volume transfers by:

Correlating the source IP with EC2 instance IDs using VPC network interface mappings
Checking CloudTrail for API activity from the instance's IAM role around the same timeframe
Reviewing the destination IP against threat intelligence feeds
Examining Security Hub findings for the affected instance

Adjust the byte threshold based on your environment's normal data transfer patterns.

Correlate network activity with CloudTrail events to build the complete picture. If you see suspicious outbound traffic in flow logs, pivot to CloudTrail to identify which user or role initiated that connection and what other actions they performed around the same time. Use attack path analysis to prioritize network exposures that lead to sensitive data or privileged identities. For example, an EC2 instance with unusual outbound traffic deserves immediate investigation if it also has a critical vulnerability, runs with admin IAM permissions, and can reach S3 buckets containing customer PII. This toxic combination of risks creates a high-probability attack path. Conversely, similar outbound traffic from an isolated development instance with read-only permissions and no access to production data represents lower risk and can be investigated later.

Wiz Akademie

AWS Security Best Practices: 10 Steps and How to Assess AWS Health

Discover essential AWS security best practices to protect your cloud environment, reduce risks, and ensure compliance with ease.

Container and serverless workload threat hunting strategies

Containers and serverless functions are ephemeral, making traditional security approaches less effective. You need to focus on configurations and runtime behaviors specific to ECS, EKS, and Lambda.

Start by examining ECS task definitions, EKS cluster configurations, and Lambda function settings. For EKS, enable control plane logging to CloudWatch Logs and activate GuardDuty EKS Runtime Monitoring for pod-level threat detection. Look for overly permissive IAM roles that give workloads more access than they need—for example, ECS task roles with admin permissions or Lambda functions with broad S3 access when they only need specific bucket access. Check environment variables for exposed secrets or credentials. Review container images for known vulnerabilities before they run in production.

Runtime monitoring reveals suspicious behaviors inside your workloads:

Container escape attempts: Processes trying to break out of container isolation to access the host
Privilege escalation: Containers attempting to gain elevated permissions within EKS clusters
Unexpected network connections: Lambda functions or containers communicating with external IPs they shouldn't access

Hunt for threats in Amazon EKS clusters

EKS threat hunting requires visibility into both the Kubernetes control plane and pod runtime:

Enable EKS control plane logging – Activate all five log types (API server, audit, authenticator, controller manager, scheduler) in your EKS cluster configuration. Send these logs to CloudWatch Logs for analysis. The audit log captures all API requests to the Kubernetes API server, including kubectl commands and service account activity.

Hunt for suspicious Kubernetes API activity:

Unexpected kubectl exec commands into production pods, especially from unfamiliar IP addresses or user accounts
Service account tokens being used from outside the cluster (check the source IP in audit logs)
Attempts to create privileged pods or modify pod security policies
Access to Kubernetes secrets containing database credentials or API keys
Creation of new ClusterRoleBindings that grant cluster-admin permissions

Activate GuardDuty EKS Runtime Monitoring – Enable this GuardDuty feature to detect threats inside running pods without deploying agents. GuardDuty monitors pod-level activity for:

Processes attempting to escape container isolation
Outbound connections to cryptocurrency mining pools or known malicious IPs
Suspicious process execution (reverse shells, credential dumping tools)
File access patterns indicating data exfiltration

Correlate EKS audit logs with VPC Flow Logs to identify pods making unusual external connections, then trace back to the deployment, service account, and IAM role for full context.

CloudWatch Logs from your applications provide application-level indicators of compromise. Hunt for unexpected processes being spawned, attempts to access sensitive files, or error messages that suggest exploitation attempts. Shift-left guardrails in CI/CD pipelines prevent risky container images and misconfigurations from ever reaching production. Scan container images for vulnerabilities, secrets, and overly permissive configurations during the build process. Block deployments that fail security policies—for example, containers running as root, images with critical CVEs, or task definitions with admin IAM roles. This prevention-first approach reduces the volume of threats you need to hunt at runtime while maintaining development velocity.

Multi-account threat hunting and cross-service correlation

Modern AWS environments span multiple accounts, and attackers know this. They move between accounts to evade detection and access more resources, so your hunting must cover your entire organization.

Enable an organization trail in AWS CloudTrail across all Regions and member accounts via AWS Organizations. This configuration sends all API events from every account to a central S3 bucket. Optionally, enable CloudTrail Lake for managed storage and SQL-based querying without setting up Athena infrastructure. Organization trails eliminate logging gaps and provide complete visibility across your AWS footprint. Use cross-account IAM roles to give your security team read-only access to logs and resources across all accounts without managing separate credentials.

Centralize and normalize with AWS Security Lake

AWS Security Lake automatically collects, normalizes, and stores security logs from multiple AWS services and accounts in a central data lake. Security Lake converts logs from CloudTrail, VPC Flow Logs, Route 53 Resolver DNS logs, and EKS audit logs into the Open Cybersecurity Schema Framework (OCSF) format. This normalization lets you write queries that work across different log types without learning each service's unique schema. Enable Security Lake in your AWS Organizations management account, designate a delegated administrator for security operations, and configure automatic log collection across all Regions and member accounts. Security Lake stores data in S3 with automatic partitioning and lifecycle management, and integrates with Athena for querying or third-party SIEM tools for analysis.

Organization-wide security service enablement checklist

Follow these steps to enable comprehensive threat hunting across all AWS accounts:

Designate a delegated administrator account – Use AWS Organizations to designate a security tooling account as the delegated admin for GuardDuty, Security Hub, Detective, and Macie. This separates security operations from the management account.
Enable services in all Regions – Activate GuardDuty, Security Hub, and Detective in every Region where you run workloads. Threats don't respect Region boundaries, and attackers often target less-monitored Regions.
Configure auto-enable for new accounts – Set up automatic enablement so new member accounts inherit security service configurations immediately. This prevents coverage gaps as your organization grows.
Centralize findings aggregation – Configure Security Hub to aggregate findings from all Regions into a single Region for your security team. Enable cross-Region aggregation in the Security Hub console.
Define automated response playbooks – Create EventBridge rules that trigger Lambda functions or Systems Manager automation documents when high-severity findings appear. For example, automatically isolate EC2 instances when GuardDuty detects cryptocurrency mining.
Establish cross-account access – Create a read-only IAM role in each member account that your security team can assume for investigation. Use AWS Organizations SCPs to prevent member accounts from deleting this role.

Cross-service correlation is essential for detecting sophisticated attacks. A GuardDuty finding in one account might connect to suspicious network traffic in another and an IAM role modification in a third. You need to piece these events together to see the full attack chain.

Build queries that span multiple accounts and services. For example, track a compromised credential from its initial use in Account A, through its assumption of a cross-account role to access Account B, to its final use launching resources in Account C.

Automated threat hunting with machine learning and behavioral analytics

Manual threat hunting doesn't scale as your AWS environment grows. Automation and machine learning help you hunt faster and more effectively across large environments. These capabilities reduce manual effort and let you focus on high-fidelity threats rather than sifting through thousands of low-priority alerts.

Amazon Detective automatically processes your logs to build a graph model of your resources and their interactions. This visualization makes it easier to investigate findings and understand relationships between events. CloudWatch Anomaly Detection establishes baselines of normal activity and alerts you when metrics deviate significantly.

Amazon Macie uses machine learning to discover and classify sensitive data in S3 buckets. This helps you hunt for potential data exposure risks without manually reviewing every bucket. You can focus on buckets containing sensitive data that also have overly permissive access policies.

Build custom automation using Amazon EventBridge (formerly CloudWatch Events) and Lambda functions. EventBridge rules trigger Lambda functions when specific security events occur, enabling automated investigation and response workflows. Create rules that trigger automated analysis when specific suspicious events occur:

Automatically investigate when a new IAM user is created outside business hours
Trigger analysis when a security group is modified to allow public access
Launch forensic data collection when GuardDuty detects cryptocurrency mining

This automation scales your hunting efforts and helps you respond to threats in real time rather than discovering them days later.

Performance optimization for large-scale AWS threat hunting

Large AWS environments generate massive log volumes that can make hunting slow and expensive. You need to optimize how you store and query this data.

Amazon Athena lets you run SQL queries directly on logs stored in S3 without setting up databases. Partition your logs by date, account ID, or region to dramatically reduce the amount of data scanned per query. This makes queries faster and cheaper.

Use columnar formats like Apache Parquet instead of raw JSON logs. Parquet files compress better and allow Athena to read only the columns you need for each query. This columnar storage typically improves query performance significantly—especially for queries that select a few columns from wide tables—while reducing data scanned and query costs.

AWS Glue performs ETL processes to clean and structure your logs before analysis. You can enrich logs with additional context, remove unnecessary fields, and convert formats to optimize for your most common queries.

Implement cost-effective retention policies using S3 Lifecycle rules. Transition older logs to cheaper storage classes like S3 Glacier after 90 days while keeping recent logs in standard S3 for fast access. This balances compliance requirements with storage costs.

Wiz Akademie

Was ist Incident Response? Ein Kurzleitfaden für SOCs

Incident Response ist ein strategischer Ansatz zur Erkennung und Reaktion auf Cyberangriffe mit dem Ziel, deren Auswirkungen auf Ihre IT-Systeme und Ihr Unternehmen als Ganzes zu minimieren.

Integration of threat hunting with incident response workflows

Threat hunting discoveries are only valuable if you can quickly escalate and act on them. Establish clear escalation paths that define what happens when you find a potential threat.

Document your findings in a standard format that includes:

Timeline of suspicious events
Affected resources and accounts
Evidence collected during hunting
Recommended next steps for investigation

This standardization ensures smooth handoffs to your incident response team. They can start investigating immediately without spending time gathering basic information.

Create automated response actions for common threat scenarios:

Isolate compromised instances – Use AWS Systems Manager to automatically modify security groups and isolate compromised EC2 instances from your network.

Disable suspicious credentials – When you detect suspicious credential usage, disable or rotate IAM access keys immediately. For temporary credentials issued by AWS STS, shorten session durations in role trust policies to limit exposure windows. If using AWS IAM Identity Center (formerly SSO), revoke active sessions through the Identity Center console.

Review credential activity – Rotate all affected credentials and review CloudTrail logs to identify what actions the compromised credentials performed.

These automated responses contain threats while your team investigates.

Accelerate remediation with ownership context

Automated routing with ownership metadata and policy-driven guardrails accelerates both containment and permanent remediation. When you detect a compromised container, automatically create a ticket assigned to the owning team with full context:

The vulnerable image
The deployment that launched it
The repository containing the Dockerfile
The specific code change that introduced the risk

This end-to-end workflow enables teams to fix the root cause in code rather than repeatedly patching runtime symptoms.

Integrate your security tools with ticketing systems like Jira or ServiceNow. When you identify a threat during hunting, automatically create a ticket with all relevant details. This ensures findings are tracked, assigned to the right team, and resolved in a coordinated way.

How Wiz enhances AWS threat hunting capabilities

Wiz provides complete visibility across your AWS environment through the Wiz Security Graph. This graph maps all your resources, configurations, and permissions, letting you instantly understand complex relationships and identify attack paths that manual log analysis would miss.

Wiz Defend delivers real-time threat detection with automated investigation workflows, correlating cloud control plane signals and runtime telemetry to uncover active threats as they happen. The Wiz Runtime Sensor uses eBPF (extended Berkeley Packet Filter) technology to provide deep telemetry from inside containers and VMs. This lightweight, kernel-level approach minimizes CPU and memory overhead compared to traditional userspace agents while capturing process execution, network connections, and file access events.

Attack path analysis automatically identifies toxic combinations of risk. For example, Wiz highlights a public-facing VM with a critical vulnerability and high-privilege credentials—exactly the kind of exposure that attackers exploit. This prioritization helps you focus on the most critical threats first.

Wiz's AI capabilities let you query your security graph using natural language. Instead of writing complex queries, you can ask "Show me all EC2 instances with public IPs that have admin access to S3 buckets containing PII." Cloud-to-code traceability connects threats observed at runtime to the vulnerable code, configuration, or infrastructure-as-code template that introduced the risk. This linkage identifies the owning team and repository, enabling true root cause remediation through code fixes rather than temporary runtime patches.

A single platform for everything cloud security

See how the Wiz Security Graph connects all your AWS threat signals in one unified view