What Are Data Breaches? Definition, Causes and Prevention

What is a data breach?

A data breach is the unauthorized access, acquisition, or exposure of sensitive information by external attackers, malicious insiders, or through accidental means such as misconfiguration. This matters because breaches trigger regulatory penalties under frameworks like GDPR and HIPAA, erode customer trust, and disrupt business operations for months or years after the initial incident.

In cloud environments, many incidents stem from misconfigurations and identity issues (such as public storage buckets, overprivileged service accounts, or exposed API endpoints) rather than classic perimeter-only intrusion patterns. However, application vulnerabilities (like Log4j or MOVEit), supply chain compromise, and credential theft through phishing remain significant attack vectors that organizations must address alongside configuration and identity controls. A developer accidentally leaving a storage bucket public or a service account with excessive permissions creates opportunities that attackers actively hunt for. The dynamic nature of cloud infrastructure means new breach vectors emerge constantly as teams spin up resources, change configurations, and deploy code multiple times per day.

The distinction between a breach and a leak matters for understanding risk and notification requirements. A breach typically involves confirmed unauthorized access (and often exfiltration) by an external attacker or malicious insider. A leak often refers to accidental exposure where data becomes accessible due to misconfiguration or human error, even if no attacker access is confirmed during investigation. In practice, a leak can become a breach if forensic analysis later reveals unauthorized access occurred.

The Data Security Best Practices [Cheat Sheet]

No time to sift through lengthy guides? Our Data Security Best Practices Cheat Sheet condenses expert-recommended tips into a handy, easy-to-use format. Get clear, actionable advice to secure your cloud data in minutes.

How do data breaches happen?

Breaches follow a lifecycle from initial access to data exfiltration. Understanding this progression helps organizations identify where to interrupt attack chains before sensitive data leaves the environment. Attackers rarely walk through a single open door. They typically combine multiple weaknesses to reach their target. In cloud environments, that chain often spans identity permissions, network reachability, workload vulnerabilities, and data store access. Stopping breaches requires seeing how these elements connect to form exploitable paths, not just addressing individual alerts in isolation.

Common entry points and attack vectors

Cloud-specific entry points differ from traditional data center attacks. Misconfigurations top the list: public storage buckets, exposed databases with default credentials, and overly permissive security groups that allow traffic from anywhere on the internet. Misconfigurations consistently rank among the top contributors to cloud security incidents. While exact percentages vary by dataset and methodology, industry reports from sources like the Cloud Security Alliance and Verizon DBIR regularly identify misconfigured storage, databases, and security groups as primary factors enabling unauthorized data access. These mistakes happen easily when teams move fast and infrastructure changes constantly.

Compromised credentials provide another common path. Phished users hand over passwords that grant immediate access. Leaked API keys in public code repositories give attackers authenticated entry without triggering security alerts. Unpatched vulnerabilities in internet-facing services let attackers exploit known weaknesses that defenders failed to remediate in time.

Insider threats come in two forms. Malicious insiders with legitimate access can exfiltrate data without triggering external intrusion detection. Negligent employees accidentally expose data through misconfiguration, improper sharing settings, or sending sensitive information to wrong recipients. Insider incidents often go undetected longer than external attacks because the activity looks like normal authorized behavior.

Supply chain attacks exploit the trust relationships built into cloud services. When attackers compromise a third-party tool or service, they gain access to every customer environment that trusts that service. The interconnected nature of cloud services creates dependencies that extend your attack surface far beyond resources you directly control.

The anatomy of a cloud data breach

Breaches progress through predictable stages. Attackers gain initial access, escalate their privileges, move laterally toward valuable targets, discover where sensitive data lives, and finally exfiltrate it. Each stage presents an opportunity for defenders to detect and stop the attack.

Breach Stage	What Happens	Examples
Initial Access	Attack gains foothold	Exposed API, misconfigured service, phished credentials
Privilege Escalation	Attacker gains higher permissions	Overprivileged service account, IAM misconfiguration
Lateral Movement	Attacker moves toward target	Cross-account access, trust relationships
Data Discovery	Attacker locates sensitive data	List buckets, query metadata catalogs, crawl shares
Exfiltration	Data leaves the environment	Direct download, staging to external services

The combination of weaknesses matters more than individual findings. Consider a real scenario: a critical vulnerability exists on an internal application server with no network path from the internet and no access to sensitive data. That vulnerability poses less actual risk than a medium-severity issue on an internet-exposed system that has database credentials stored in its environment variables. Context determines priority.

Why identity and permissions matter

Most breaches involve some form of identity compromise or misuse. According to the Verizon 2024 Data Breach Investigations Report, stolen credentials remain the most common initial access vector across all breach types. In cloud environments specifically, this includes phished user credentials, leaked API keys in code repositories, and compromised service account tokens that grant attackers authenticated access without triggering external intrusion alerts. Overprivileged service accounts, stale credentials that should have been revoked months ago, and excessive permissions create the pathways attackers use to reach sensitive data. An attacker who compromises a single overprivileged identity can often access far more than they need.

Identity sprawl in cloud environments compounds this problem. Service accounts multiply as teams automate processes. Machine identities authenticate workloads to each other. Federated access connects identity providers across organizational boundaries. Cross-account roles enable resource sharing between different cloud accounts. This creates complex webs of permissions that become nearly impossible to audit manually.

Effective permissions often differ from configured permissions. A user might have minimal direct permissions but inherit broad access through group membership. Resource policies might override identity policies. Cross-account trust relationships might grant access that does not appear in the identity's own policy documents. Understanding what an identity can actually do requires analyzing the full permission chain, not just reading individual policy statements.

wiz academy

13 Data Security Best Practices Every Security Team Needs

Types of data breaches

Breaches can be categorized by how they occur and what data is compromised. Different types require different prevention and response strategies, so understanding these categories helps teams focus their defenses appropriately.

By attack method

External attacks: These involve exploitation of vulnerabilities, credential theft through phishing or credential stuffing, and social engineering targeting employees with access to sensitive systems. Attackers scan the internet constantly for exposed services and misconfigurations they can exploit.
Insider threats: Malicious insiders abuse legitimate access to steal data for personal gain or competitive advantage. Negligent employees mishandle data without malicious intent. Compromised accounts occur when external attackers steal insider credentials and operate as if they were authorized users.
Accidental exposure: Misconfigurations make data publicly accessible without any attack occurring. Unintended sharing settings expose documents to broader audiences than intended. Sensitive data appears in logs or error messages that get stored without protection. Forgotten test environments contain copies of production data that lack production security controls.

By data type compromised

Different data categories carry specific implications for notification, regulatory compliance, and potential harm:

PII (Personally Identifiable Information): Names, addresses, Social Security numbers, and other identifying data trigger notification requirements under most breach laws and create identity theft risk for affected individuals.
PHI (Protected Health Information): Medical records fall under HIPAA with specific breach notification rules, potential penalties, and requirements for notifying the Department of Health and Human Services.
Financial data: Payment card numbers and bank account information fall under PCI DSS requirements and create immediate fraud exposure that financial institutions must address.
Credentials: Passwords, API keys, and authentication tokens enable further attacks and lateral movement. A credential breach often precedes a larger data breach.
Intellectual property: Trade secrets and source code cause competitive and operational damage that may be difficult to quantify but can threaten business viability.

The type of data compromised determines your regulatory obligations, notification requirements, and potential business impact. A breach affecting only internal documentation differs vastly from one exposing customer financial records.

What are the consequences of a data breach?

Breach consequences extend far beyond immediate incident response costs. Organizations face financial, regulatory, operational, and reputational impacts that can persist for years after the initial incident is contained.

Financial and regulatory impact

Direct costs add up quickly. Forensic investigation to determine what happened, legal counsel to navigate notification requirements, notification expenses to reach affected individuals, credit monitoring services for those individuals, and system remediation to close the vulnerability that enabled the breach. These costs now average $4.44 million globally per breach. These costs scale with breach size and complexity—a breach affecting millions of records costs exponentially more than one affecting thousands.

Regulatory penalties create additional financial exposure. GDPR fines can reach up to 4% of global annual revenue for the most serious violations. HIPAA penalties range based on the level of negligence involved, from reasonable mistakes to willful neglect. State-level fines under CCPA and other privacy laws add further liability. Penalties consistently increase for organizations that failed to implement reasonable security measures that could have prevented the breach.

Class action exposure and litigation costs can exceed regulatory fines. Breaches involving large numbers of individuals often trigger multi-year legal proceedings. Settlement costs, attorney fees, and the management distraction of ongoing litigation compound the financial impact long after the initial incident response concludes.

Operational and reputational damage

Operational disruption during incident response affects the entire organization. Systems taken offline for investigation cannot serve customers. Business processes get interrupted while security teams determine the breach scope. IT resources get diverted from planned projects to handle the emergency response.

Long-term reputational damage proves difficult to measure but often exceeds direct financial costs. Customers who lose trust take their business elsewhere. Acquiring new customers becomes harder when breach news appears in search results. Partners question whether to maintain relationships that could expose them to third-party risk.

Executive accountability has increased significantly. CISOs and other security leaders increasingly face personal consequences including termination following major breaches. In some cases, legal liability for breaches attributed to negligence has extended to individual executives who failed to ensure adequate security measures.

How to prevent data breaches

The Data Security Best Practices [Cheat Sheet]

Effective prevention requires treating breach risk as an exposure management discipline, continuously validating where sensitive data lives, which identities can reach it, and which network paths make that access realistic. Organizations that cannot answer these three questions are guessing about their risk posture:

Where is our sensitive data? Including shadow copies, snapshots, and forgotten test environments
Who can access it? Considering effective permissions, not just configured policies
How could an attacker reach it? Mapping the paths from internet exposure through vulnerabilities and identities to data stores

Know where your sensitive data lives

Automated data discovery and classification across all cloud environments forms the foundation of breach prevention. Manual inventories become outdated immediately in dynamic cloud environments where developers can provision new storage with a single command.

Shadow data deserves particular attention. Snapshots created for backup or testing contain copies of production data. Database backups stored in different regions or accounts may lack the encryption and access controls of the primary database. Development and test copies of production data often have relaxed security. Data in logs captures information that was never intended to be persisted. Unmanaged storage created for temporary purposes gets forgotten but remains accessible.

Discovery must be continuous, not point-in-time. A quarterly data inventory cannot keep pace with environments where new data stores appear daily as developers provision resources. By the time a manual inventory is complete, it no longer reflects reality.

Understand who can access it

Effective permissions analysis reveals what identities can actually do, not just what policies appear to allow. This requires analyzing inheritance from parent resources, group membership that may not be obvious, resource policies that can override identity policies, and cross-account access that spans organizational boundaries.

Least privilege enforcement ensures identities have only the permissions necessary for their function. Overprivileged accounts create unnecessary risk by giving attackers more options when they compromise a credential. The challenge is that granting broad permissions is easy during initial deployment, while restricting them to least privilege requires understanding what access is actually needed.

Identity hygiene practices reduce the attack surface over time. Remove stale credentials that should have been revoked when employees left or projects ended. Rotate keys regularly so that leaked credentials expire before attackers can use them. Eliminate unused service accounts that provide access without serving any current purpose. Monitor for credential exposure in code repositories or logs where developers might accidentally commit secrets.

See how attackers could reach it

Attack path analysis connects exposure, vulnerabilities, permissions, and data location to identify real risk rather than theoretical issues. The combination matters more than individual findings because attackers chain together weaknesses to reach their targets.

A critical vulnerability on an isolated system with no sensitive data poses less risk than a medium vulnerability on an internet-exposed system with database access. Context determines priority. Without that context, security teams cannot distinguish between findings that require immediate attention and findings that can wait.

Attack path analysis helps teams focus remediation on the issues that would actually lead to a breach rather than chasing every alert. When you can see that a specific misconfiguration creates a path from the internet to your customer database, that finding jumps to the top of the priority list regardless of its severity rating in isolation.

Detect and respond to threats in real time

Runtime monitoring watches for anomalous behavior that indicates an attack underway. The goal is reducing the time between initial access and detection so response can begin while attackers are still active. Industry research consistently shows that most breaches take days or weeks to detect; IBM's 2024 report found the average time to identify a breach was 194 days. Organizations with strong detection capabilities, including runtime monitoring and automated alerting, can reduce this window significantly.

Unusual data access patterns might indicate an attacker exploring what they can reach. Unexpected privilege escalation suggests someone trying to gain additional access. Signs of lateral movement show progression through an attack path. Data exfiltration indicators reveal attempts to move data outside the environment.

Correlating runtime signals with posture data provides context that makes detection more accurate. An unusual API call from an identity that is already flagged as overprivileged and has access to sensitive data represents a different risk than the same call from a tightly-scoped service account with no data access.

Shift security left without slowing development

Scanning infrastructure-as-code and container images before deployment prevents misconfigurations from reaching production. When a storage bucket would be created with public access, catching that in the code review is far easier than discovering it in production after data has been exposed.

The balance between security gates and developer velocity requires careful attention. Guardrails should catch genuine risks without blocking legitimate work. A policy that fails every build creates pressure to work around security rather than with it. A policy that catches only the most dangerous issues while allowing reasonable configurations maintains developer trust while preventing breaches.

Issues caught in development are cheaper and faster to fix than issues discovered in production. The developer who wrote the code is still working on that feature. The context is fresh. The fix is a code change rather than an emergency remediation. Shifting security left is not just about risk reduction; it is about efficiency.

Wiz's approach to data breach prevention

Preventing breaches requires unified visibility into data, access, and exposure. Disconnected tools cannot show how risks combine to create breach paths. When your vulnerability scanner, identity management system, and data classification tools operate in silos, correlating their findings to understand actual risk becomes a manual effort that cannot keep pace with cloud environments.

Wiz approaches data security by connecting these pieces into a single, shared risk model, enabling teams to prioritize the few attack paths that actually lead to sensitive data rather than chasing thousands of disconnected alerts. Data Security Posture Management (DSPM) agentlessly discovers and classifies sensitive data across all cloud environments, including shadow data in snapshots, backups, and unmanaged stores. Organizations gain visibility into data they did not know existed: the forgotten test database with production data, the snapshot that was never deleted, the log bucket capturing PII.

Attack path analysis maps how misconfigurations, vulnerabilities, and overprivileged identities connect to that sensitive data. Rather than presenting thousands of isolated findings, Wiz shows the paths an attacker could actually take from initial access to data exfiltration. This shows real breach risk rather than theoretical issues, helping teams focus remediation on what actually matters.

The Wiz Security Graph correlates exposure, identity, workload risk, and data sensitivity to surface "toxic combinations" that represent viable attack paths. A public-facing server with a critical vulnerability that has access to an unencrypted database containing PII appears as a connected risk rather than three separate findings in three different tools. Teams see the full picture rather than disconnected alerts.

Identity and entitlement analysis (CIEM) shows effective permissions and identifies overprivileged identities that could be used to access sensitive data. Understanding what identities can actually do, not just what policies appear to allow, is essential for preventing identity-based breaches that remain the most common attack pattern in cloud environments.

Organizations using Wiz can answer the fundamental questions: where is our sensitive data, who can access it, and how could an attacker reach it. As Zendesk found, "With Wiz, we could see everything across our environment... Our confidence in our cloud inventory and coverage went from maybe 60% to 100%."

Book a demo to see how Wiz connects sensitive data discovery, identity analysis, and exposure mapping so your team can prioritize and remediate the attack paths that actually lead to breach risk before attackers find them.

Get a 1:1 demo of your data risks

See how Wiz DSPM automatically discovers sensitive data, maps where it lives, and shows exactly how it could be accessed or exposed — all in minutes.

What is a data breach in cloud environments?

Key takeaways about data breaches:

What is a data breach?

The Data Security Best Practices [Cheat Sheet]

How do data breaches happen?

Common entry points and attack vectors

The anatomy of a cloud data breach

Why identity and permissions matter

13 Data Security Best Practices Every Security Team Needs

Types of data breaches

By attack method

By data type compromised

What are the consequences of a data breach?

Financial and regulatory impact

Operational and reputational damage

How to prevent data breaches

The Data Security Best Practices [Cheat Sheet]

Know where your sensitive data lives

Understand who can access it

See how attackers could reach it

Detect and respond to threats in real time

Shift security left without slowing development

Wiz's approach to data breach prevention

Get a 1:1 demo of your data risks