Secret scanning: How it works and best practices

What is secret scanning?

Secret scanning is the practice of running automated scans on code repositories, execution pipelines, configuration files, commits, and other data sources to prevent potential security threats posed by exposed secrets. 

Secret scanning is part of the broader scope of secret management, which encompasses the processes and tools involved in storing and guarding secrets from unauthorized entities. 

What are secrets?

Secrets are credentials used to authenticate against or get authorized access to perform sensitive actions in an enterprise’s IT systems. 

Software projects often rely on third-party components—containers and container orchestration platforms, DevOps and CI/CD tools, databases, repositories, etc. To connect to these third-party services and enable communication between various app components, your software environment needs a way to authenticate the service or app component; this usually happens in the form of a “secret,” i.e., a key, password, certificate, or token.

How do secrets differ from sensitive data? 

Unlike sensitive data (e.g. social security numbers and credit card info), which typically belong to end users, secrets belong to enterprises. Examples of secrets include:

  • LDAP passwords

  • PKI/TLS certificates

  • Encryption keys

  • Container credentials

  • SSH keys

  • API tokens

Developers use secrets to authenticate and establish communication between their systems and other cloud services, or to control human and machine access to sensitive systems.

Why is secret scanning important?

As digital authentication credentials, secrets—if exposed—can grant adversaries unauthorized access to a company’s code bases, databases, and other sensitive digital infrastructure. 

Unfortunately, securing secrets is not an easy task. While secrets must be encrypted and tightly controlled, they must also be made accessible to engineering teams, apps, and across an entire environment. 

Consequently, at one point or another during the software development lifecycle (SDLC), secrets often find their way into potentially exposed spaces: hard-coded credentials in continuous integration and continuous delivery (CI/CD) pipelines, code repositories, version control systems (VCS), security software, containerization environments, or workplace communication channels (e.g., Slack, Teams). 

This happens because devs are focused on writing and shipping quality code at breakneck speed. So when software is still in the development and testing stages, they may consider it ideal to store secrets in local machines to speed up development and facilitate faster feedback loops. 

Enter secret scanning. Let’s discuss the four primary reasons developers should implement it. 

Safeguarding sensitive data and secrets 

To safeguard sensitive data, we encrypt it, in transit and at rest, and store it in databases. Secrets are then used to gate-keep the databases, limiting access to authorized humans and machines only. 

For example, to confirm username/password pairs entered by end users, your login portal must establish an automatic connection to your database. This connection is authenticated with a secret, authorizing the portal’s access to the sensitive information in the database. If this secret gets leaked along the way or ends up in an unsecured space, unauthorized individuals can access, steal, expose, or encrypt the data for ransomware attacks.

The Microsoft 2023 data exposure incident discovered by the Wiz research team is a perfect demonstration of the significance of secret scanning. In an attempt to publish AI-based training data on GitHub, a Microsoft research team accidentally shared a link that exposed 38 TB of private data, including private keys, secrets, passwords, and over 30,000 internal Microsoft Teams messages, stored in a Microsoft Azure storage account. 

This incident could have been prevented if the team had scanned the account for secrets before releasing the link to GitHub.

Thwarting cyberattacks 

The Wiz research team also found forgotten secrets in multiple overlooked locations in CI/CD pipelines, especially container image base layers and Linux bash history files. Attackers can leverage such exposed secrets to conduct cyberattacks using various scenarios. The forgotten secrets can facilitate lateral movement and remote code execution in software supply chain attacks, empowering hackers to modify an enterprise’s source codes, plant malicious code in production-ready artifacts, or tamper with images’ build processes. 

By finding forgotten or hard-coded secrets before they are exposed, secret scanning tools help nip various forms of cyberattackers in the bud. 

プロのヒント

Agentless scanning solutions typically have quicker setup and deployment and require less maintenance. They can scan all workloads using cloud native APIs and connects to customer environments with a single org-level connector. If the approach is agent-based, this type of deployment will require ongoing agent installation, update, and maintenance effort.

詳細はこちら

Improving compliance

Many companies are subject to regulatory standards designed to protect end users’ sensitive personal, financial, and health-related information.  As secrets guard this data, any accidental release of secrets may result in a data breach that could result in hefty noncompliance fines. 

Secret scanning can help detect and prevent secrets from being compromised.

Protecting against reputational damage and financial loss

Breaches and cyberattacks cause significant reputational damage, negatively impacting revenue and  increasing l costs such as fines, legal fees, and settlements. 

Proactively scanning for and safeguarding secrets will help avoid such steep consequences. 

How does secret scanning work?

Secret scanning entails a few steps, performed with specialized tools and methods. Here’s how it works. 

Step 1: Scanning

Once a secret scanner is installed and connected to all relevant parts of your IT stack, it conducts real-time or at-rest scans of your stack. 

Real-time scans are event-driven, triggered by pull requests in your version control system (VCS) or code changes in any of the following components of your stack:

  • Code: Code repositories, config files 

  • Containers: Container images and Kubernetes architecture

  • DevOps technology stack: Build systems, ticketing systems, communication channels, knowledge management systems, bug tracking software, support stack, etc.

  • Observability pipelines: Observability/logging software, and data stores

At-rest scans conduct historical scans of the same components at scheduled intervals. 

Secret scanning techniques

As secrets are often embedded in code, logs, etc., identifying them can be tricky. The table below details four secret scanning techniques.

Scanning techniqueDescriptionProsCons
Regular expression (Regex)Scans for secrets by specifying a sequence of characters distinctively associated with a service type; e.g., a regex search of a Stripe API key with 200 characters may look like this: SK[a-z0–9]{200}Reduces false positives since the scanner checks for regular patternsSecrets with random patterns go undetected due to use of regular expressions. In addition, regular expression scans are computationally expensive and can be slow.
EntropyAnalyzes target files for unpredictable strings such as highly random or high entropy strings, e.g., JapFXI/X7MBE/bPEXAMPLEKEY, or not-so-random or low entropy strings, e.g. kkkkkk; results ranked, with high entropy strings believed to be most indicative of a potential secretGreat for detecting highly randomized, unknown, or unpatterned secretsFalse positives are common, with scan results listing database IDs, file paths, URLs, etc., which contain random alphanumeric characters as high entropy secrets
DictionaryFinds secrets in target files by comparing character strings in the files to known secrets entered into a secret management tool such as HashiCorp VaultKnown expression patterns used, making secrets easier to verifyUnknown credentials omitted due to use of known expressions. In addition, dictionaries tend to be language specific.
HybridCombines two or more scanning techniques; may also involve deploying machine learning technologiesDelivers fewer false positives and detects many more secrets and secret typesNot offered by most secret scanners

Step 2: Identifying and verifying secrets

If the scanner detects a potential secret, it either corresponds with the service provider or extracts metadata within your stack to identify the service that the secret pattern matches; it then detects if it is still valid. 

Step 3: Reporting and alerting

If a match is confirmed, the scanner notifies you of the exposed secret. Depending on how comprehensive the tooling is, it may also provide recommendations for resolving the issue. Note: Make sure only authorized parties have access to this report, as it would contain sensitive data.

Open-source secret scanning tools

Below, we explore five common open-source secret scanning tools.

Detect-secrets

Detect-secrets is a Microsoft project that scans your project’s Git history using heuristics and regex.

ProsCons
Fast scans of projects’ current states only, reducing false positives from past secret leaksDoes not identify high-entropy secrets
Allows devs to compare heuristic and current commits to prevent repeated secret leaksDoes not run in-depth scans

Gitleaks

Gitleaks scans repos, directories, files, and entire Git histories to detect past and present exposed secrets. It can be installed using Docker, Go, or Homebrew.

ProsCons
Compatible with Linux, Windows, and other platforms/OSesLimited scalability, designed to run on one server only
Can be set up to scan code pre-commits to proactively prevent secret exposureHas no user interface; good for detection only, not incident management

Whispers

Whispers scans static structured text files such as configs, XML, JSON, and Python3 for hard-coded secrets. Unlike the others, it doesn't scan code but instead parses known data formats and extracts key-value pairs to detect secrets. 

ProsCons
Allows for custom configuration options, enabling you to remove unwanted results and minimize false positivesDoes not scan code or Git repos, only the config files uploaded to Git repos
Can detect secrets in pre-commitsDesigned as a secondary tool

Git-secrets

Git-secrets is an AWS command-line tool for scanning commits, commit messages, and “–no-ff” merges. 

Column AColumn B
Offers push protection via  a “secret providers” feature that outputs prohibited regex patternsLimited coverage; ideal for AWS resources only
Actively stops commits and merges containing secrets from finding their way into Git reposUses regex patterns only; high false-positives

Git-all-secrets

Git-all-secrets is an aggregation of multiple secret scanners, including TruffleHog (a regular expression-based scanner) and repo-supervisor (a high entropy-based scanner). 

ProsCons
Flexible; allows you to specify if a combination of scanners and techniques should be used or notOnly detects secrets in commits; can’t stop secrets from getting into repos
Helps lower false positives via multiple techniquesHas a limited user interface and is no longer actively maintained

What about proprietary tools?

Secret scanning can also be done using proprietary tools. Open-source tools come at little to no financial costs, but they also may not offer as much coverage as proprietary tools. Conversely, proprietary tools require varying degrees of financial commitment but typically have more features and offerings. 

Whichever you choose to go with, be sure to look out for the scanning technique the tool uses; for example, a hybrid scanner will help reduce false positives and detect more secret types. Additionally, consider the provider’s reputation and the tool’s ability to conduct real-time monitoring/alerting, incident response, and risk prioritization.

6 best practices for secrets management 

On top of scanning secrets, it’s also important to implement the following secrets management best practices:

1. Store and encrypt secrets using a secrets manager

Avoid storing secrets in container images, config files, code, and other unprotected places to prevent secret sprawl. Instead, use dedicated secret management tools (e.g. HashiCorp Vault or AWS Secrets Manager) that encrypt secrets at rest and in transit. 

2. Adopt (regular) secrets rotation and dynamic secrets

Secrets rotation involves periodically changing secrets at preconfigured intervals or manually triggering a change. Using dynamic secrets is one way to implement secret rotation; as opposed to static secrets, these are short-lived, meaning they expire after a specific timeframe or after certain conditions are met. 

Regularly rotating secrets limits a hacker’s window of opportunity, reducing the possibility of compromised secrets being used to conduct cyberattacks. 

3. Restrict access to secrets

Create secret access policies that are consistent across your stack and automate their enforcement. This includes enforcing the principle of least privilege (PoLP), access control lists (ACLs), and role-based access control (RBAC); these will limit users’ and apps’ access to secrets, data, and infrastructure to a need-to-use basis only. 

If a credential is accidentally compromised, PoLP, ACLs, and RBAC can help shrink the attack surface, limiting a threat actor’s ability to move laterally in your environment.

4. Use placeholders

Avoid hardcoding secrets, as you may need to share code in public repos. The Microsoft incident discussed above is an example of this. Instead of hardcoded secrets, use environment variables to reference secrets in your code. 

5. Track secrets lifecycle

Keep track of secrets currently in use, revoke compromised secrets, and record access events (who’s accessing what and when) in a comprehensive audit log.

6. Implement threat path analysis

Choose a secret scanning tool with advanced attack path analysis; this will detect secrets, correlate them with relevant systems, and give you a clear map of resources and systems on the attack path. 

Scanning for secrets with Wiz

Wiz Code enhances your secret scanning efforts by detecting exposed credentials, API keys, and sensitive information across your codebase, ensuring they are caught before reaching production environments.

As part of Wiz's comprehensive cloud security platform, Wiz Code scans your entire workflow for threats and vulnerabilities using 35+ supported compliance frameworks across Terraform, CloudFormation, Ansible, Google Deployment Manager, ARM, Kubernetes, Helm, and Docker. 

Wiz automatically integrates with code repos to: 

  • Analyze system volumes and detect exposed secrets such as cloud platform access keys, domain certificates, and SSH keys

  • Scan for known data related to secrets and extract metadata to provide context

  • Extract algorithm and bit length information to link SSH private keys to their authorized keys configuration 

  • Pull details like subjects, expiration dates, and important attributes to link a certificate to the resource it is used for

  • Provide security graphs for tracing potential attack paths 

  • Alert stakeholders when secrets are detected 

Wiz deploys cloud-native scanners to ensure speed, efficiency, and comprehensive scanning—a rare combination. 

Request a demo today to see how Wiz can help keep your secrets safe.