CloudQuery: Open‑Source Cloud ELT Framework for Modern DevSecOps

Wiz Experts Team

TL;DR, What is CloudQuery?

CloudQuery is an open-source, high-performance data movement framework that’s powered by Apache Arrow under the hood for fast columnar processing.

DevSecOps teams often struggle with fragmented visibility across multi-cloud environments, which makes it tough to govern their entire infrastructure stack. CloudQuery solves this problem by building an infrastructure data pipeline that extracts, transforms, and loads data from cloud providers like AWS and Azure into any SQL database. Using this pipeline creates a unified multi-cloud asset inventory, allowing your teams to audit their entire stack with familiar SQL queries. You get to skip complex custom integrations and empower data-driven security and operational decisions.

As an open-source project, CloudQuery provides a flexible, community-driven foundation for achieving the comprehensive infrastructure visibility and control you need.

Getting Started with DevSecOps

After reading this playbook, you’ll be able to embed security checks in IDEs, CI pipelines, and cloud infrastructure without derailing developers.

At-a-Glance

  • URL: https://www.cloudquery.io

  • License: MPL-2.0

  • Primary Language: Go

  • Stars: 6.2k

  • Last Release: August 2025

  • Topics/Tags: go, kubernetes, github-api, bigquery, aws, data, google, sql, etl, azure, gcp, data-engineering, data-analysis, data-integration, data-collection, elt, etl-framework, cspm, airbyte, attack-surface-management

Common use cases

  1. Cloud Security Posture Management (CSPM): Security teams can execute SQL queries against ingested cloud data to continuously identify and flag misconfigurations, such as public S3 buckets or overly permissive firewall rules, enabling proactive remediation and automated security monitoring.

  2. Compliance Auditing and Reporting: Teams can automate evidence collection for audits like SOC 2 or HIPAA by running scheduled SQL queries that verify security controls (e.g., encryption, logging). This approach generates auditable reports and streamlines the compliance process.

  3. Cloud Cost Optimization: FinOps teams can analyze billing and resource utilization data to identify idle resources, enforce tagging policies, and understand cost drivers. This analysis allows for data-driven decisions that reduce cloud spend and improve financial governance.

  4. Incident Response and Forensics: During a security incident, responders can query historical infrastructure state data to quickly understand changes, identify compromised resources, and trace an attacker's path, significantly accelerating investigation and containment efforts.

  5. Centralized Asset Inventory: Operations teams can create a unified, multi-cloud asset inventory by consolidating resource data from all providers. This single source of truth can be used for dependency mapping, ownership tracking, and impact analysis for planned changes.

How does CloudQuery work?

CloudQuery uses a flexible, pluggable architecture to extract, load, and transform (ELT) your cloud and SaaS data into any supported SQL database. Orchestrated by its command-line interface (CLI), the process kicks off when you execute a configuration file defining your data sources and destinations. The CloudQuery engine then initializes the necessary plugins, which connect to provider APIs, fetch the raw data, and stream it to your destination. This entire workflow creates a continuously updated, queryable view of your infrastructure.

  • Sources: Source plugins connect to various cloud and SaaS APIs (like AWS, GCP, GitHub) to extract raw data. They intelligently map complex API responses into structured relational tables (e.g., aws_ec2_instances).

  • Transformers (Optional): During the ELT process, transformer plugins can modify data in-flight. Common transformations include renaming columns or flattening nested JSON fields to make queries simpler.

  • Destinations: Destination plugins load the extracted and transformed data into your target SQL database—such as PostgreSQL, Snowflake, or BigQuery—and support different write modes to ensure data consistency.

Core Capabilities

  1. Universal Cloud & SaaS Integrations: CloudQuery centralizes data by connecting to a wide array of sources using a rich plugin ecosystem, including the CloudQuery AWS plugin and CloudQuery Azure integration. It supports major cloud providers, Kubernetes, and SaaS applications like GitHub and Okta. This ability to centralize data breaks down silos, enabling organizations to build a multi-cloud asset inventory and gain a holistic, queryable view of their entire tech environment for unified security, compliance, and operational analysis.

  2. SQL-Native Data Transformation: At its core, CloudQuery excels at cloud data transformation by converting complex and often unstructured API responses into a normalized, relational SQL schema. This conversion allows security, DevOps, and FinOps teams to leverage their existing SQL knowledge for advanced cloud querying. It eliminates the need to learn proprietary languages, democratizing data access and enabling powerful, flexible analysis of infrastructure configurations and operational metrics directly within a standard database.

  3. Policy-as-Code & Compliance Automation: By representing infrastructure state as SQL tables, CloudQuery enables a powerful policy-as-code workflow. Your organization can define security and compliance rules (e.g., “all S3 buckets must have encryption enabled”) as simple SQL queries. You can execute these policies automatically as part of an infrastructure data pipeline to provide continuous auditing, flag misconfigurations in near-real-time, and generate auditable reports that streamline compliance with standards like SOC 2 and HIPAA.

  4. High-Performance & Scalable Data Extraction: Built for large and complex enterprise environments, CloudQuery is designed for high-performance data synchronization. It uses efficient parallel processing and optimized data pipelines to extract and load massive volumes of data from numerous sources with minimal latency. Its architecture scales horizontally, ensuring it can handle the demands of rapidly growing multi-cloud and hybrid infrastructures without performance degradation and making it a robust foundation for data-driven operations.

  5. Extensible and Open-Source Framework: CloudQuery's functionality is driven by its extensible plugin architecture, allowing the community and organizations to develop new integrations for any API-driven source. Being open-source gives you transparency and flexibility while helping you avoid vendor lock-in. This open approach is a key differentiator when comparing solutions like CloudQuery vs. Steampipe, as it allows organizations to customize and extend the tool to meet their specific data aggregation and analysis needs.

Limitations

  1. Requires SQL Proficiency: CloudQuery's primary interface for analysis is SQL. While powerful, this reliance on SQL presents a steep learning curve for teams that lack strong expertise in it. Organizations without sufficient SQL skills may struggle to write complex security policies or derive deep insights, potentially underutilizing the tool's full capabilities without dedicated training or specialized personnel.

  2. Data Freshness Latency: As an ELT/ELT tool, the data available for querying is a snapshot from the last synchronization, not a real-time stream. This inherent latency means the tool isn't suitable for use cases requiring immediate, up-to-the-second data, such as real-time intrusion detection or responding to active threats that require instantaneous state information.

  3. Operational and Hosting Overhead: Unlike fully managed SaaS solutions, CloudQuery is typically self-hosted, requiring you to manage its deployment, maintenance, and scaling. The self-hosting responsibilities include setting up the destination database and managing the execution environment, often via a CloudQuery Docker setup. This operational burden can be significant for smaller teams or organizations seeking a zero-maintenance solution.

  4. Coverage Dependent on Plugin Ecosystem: CloudQuery's visibility is strictly limited by the availability and maturity of its plugins. If a specific cloud service, resource type, or SaaS application isn't supported by a plugin, that data cannot be ingested. This lack of a plugin can create critical visibility gaps in an organization's security posture or asset inventory until a new one is developed.

  5. Needs High-Privilege Credentials: To comprehensively scan and extract configuration data from sources like AWS or Azure, CloudQuery plugins require broad, read-only access permissions. Managing and securing these high-privilege credentials can pose a security risk and a compliance challenge, as they can become a high-value target for attackers if not properly secured and monitored.

Getting Started

Step 1:

To install CloudQuery on macOS, run:

brew install cloudquery/tap/cloudquery

Step 2:

After installation, verify it with:

cloudquery --version

For Linux and Windows, see the CloudQuery quickstart guide for platform-specific instructions. Once installed, you're ready to run your first sync using CloudQuery plugins and a configuration file. Explore the quickstart documentation for detailed steps on setting up a data source and destination to extract and load your data efficiently.

Enrich CloudQuery data with real security context

CloudQuery gives you visibility into your multi-cloud inventory. Wiz ties that data to security findings, identities, and sensitive data for a complete cloud risk picture.

For information about how Wiz handles your personal data, please see our Privacy Policy.

Alternatives

ToolCore Functionality/ApproachData Sources/IntegrationsQuery Language/MechanismPrimary Use Cases
CloudQueryHigh-performance ELT (Extract, Load, Transform) framework for cloud APIsSyncs data to a variety of destinationsWide array of cloud providers (AWS, Azure, GCP), Kubernetes, and SaaS applications (GitHub, Salesforce, Okta)SQL-native transformation of API data into normalized SQL tablesSecurity posture management, asset inventory, cost optimization, compliance automation, policy as code
SteampipeInstantly query cloud APIs and other services using SQL in real time without data replicationQueries live APIs directlyCloud APIs (AWS, Azure, GCP), GitHub, Kubernetes, and many other services via pluginsSQL (queries live APIs)Ad-hoc querying, compliance checks, security audits
AirbyteOpen-source data integration platform focused on ELT pipelinesProvides a large number of pre-built connectorsBroad range of sources: applications, APIs, databasesDestinations include data warehouses and data lakesPrimarily UI-driven for pipeline management; not a direct query language for dataGeneral-purpose data integration, building ELT pipelines
Fix Inventory (formerly Resoto)Open-source tool for "housekeeping for clouds"Inventories cloud resources, metadata, and dependencies as a graphCloud resources (AWS, Azure, GCP, Kubernetes), their metadata and relationshipsDedicated graph-based search syntax to traverse resource relationshipsFinding leaky resources, managing quotas, detecting configuration drift, understanding complex resource relationships

FAQs