BigQuery Table Partitioning and Clustering

Welcome — in this lesson you’ll learn how table partitioning and clustering work in BigQuery, why they matter for cost and performance, and when to use each. If you’ve used Hive or other data warehouses, these concepts will feel familiar. This article maps those ideas to BigQuery with a concise example to show the practical differences and benefits.

Example scenario

Imagine a simple orders table with these columns:

order_id
country
status
order_date

Start with the table unpartitioned and unclustered, then examine how queries behave as you apply clustering and partitioning.

Unoptimized table behavior

When the table has no partitioning or clustering, BigQuery cannot prune by date or specific column values. For example:

SELECT * FROM dataset.orders;

This query scans the entire table. Even a filtered query such as:

SELECT * FROM dataset.orders WHERE country = 'US';

may still scan much of the table because the storage layout does not group rows by country or order_date. On very large tables this increases query latency and cost—especially with billions of rows.

Clustering by a column (example: country)

Clustering organizes data so rows with similar values for the clustered columns are stored physically close together. If you cluster by country, values like Japan, UK, and US will tend to reside in nearby blocks on disk. When you run:

SELECT * FROM dataset.orders WHERE country = 'US';

BigQuery can use cluster metadata to skip blocks that are unlikely to contain US rows, reducing the amount of data scanned. Clustering is most effective when your queries frequently filter on the clustered columns (equality predicates, GROUP BY, or JOIN keys) and when the clustered columns have moderate cardinality.

Partitioning (example: order_date)

Partitioning splits a table into segments, often by DATE or TIMESTAMP. A typical pattern is partitioning by order_date, so each day (or range) lives in a separate partition. If your query filters on the partition column, e.g.:

SELECT * FROM dataset.orders WHERE order_date = '2022-08-05';

BigQuery performs partition pruning first and scans only the relevant partition(s). This reduces the scanned data dramatically for time-bounded queries.

Combining partitioning and clustering

The best efficiency often comes from combining partitioning and clustering:

Partitioning prunes large swaths of data (for example, specific date ranges).
Clustering then prunes within those partitions to find rows for specific values (for example, country).

Example:

SELECT * FROM dataset.orders
WHERE order_date = '2022-08-05' AND country = 'US';

Execution flow:

Partition pruning: BigQuery scans only the 2022-08-05 partition.
Cluster pruning: Within that partition, BigQuery reads only the blocks that likely contain country = 'US'.

This two-stage pruning minimizes scanned bytes, lowers query latency, and reduces cost.

Sample DDL for a partitioned and clustered table

Use this DDL to create a table partitioned by order_date and clustered by country:

CREATE TABLE dataset.orders (
  order_id STRING,
  country STRING,
  status STRING,
  order_date DATE
)
PARTITION BY order_date
CLUSTER BY country;

Best practices and guidance

Topic	Recommendation	Why it matters
Partition column	Choose a column used commonly in date or range filters (e.g., `order_date`).	Partition pruning avoids scanning irrelevant time ranges.
Clustering columns	Cluster on columns used in equality filters, GROUP BY, or JOIN predicates. Prefer moderate-cardinality columns (not unique IDs).	Enables block-level pruning and speeds selective queries.
Number of clustering columns	Keep it small (typically 1–4 columns).	More columns dilute clustering effectiveness and increase maintenance overhead.
When to define partitions	Define partitioning at table creation where possible. To partition an existing table, create a new partitioned table and copy data.	Initial partitioning produces the cleanest storage layout and best early performance.
Adding clustering	You can add clustering to existing tables; subsequent writes will benefit.	Enables incremental improvement without full rewrite, though initial data may not be optimally clustered.

Quick example queries

SELECT * FROM dataset.orders;
SELECT * FROM dataset.orders WHERE country = 'US';
SELECT * FROM dataset.orders WHERE order_date = '2022-08-05' AND country = 'US';

Partition pruning (by the partition column) runs first; clustering then prunes within the selected partitions. Together, they significantly reduce scanned data and query cost.

If your queries rarely filter on the partition column or clustered columns, BigQuery will still scan large portions of the table. Design partitions and clusters based on real query patterns to avoid unexpected costs.

Watch Video

Demo Materialized Views

BigQuery Storage Types and Cost Management

Introduction

GCP Networking

Identity and Access Management (IAM) in GCP

Cloud Observability

Development & CI/CD

Data Security & Encryption

Data Ingestion Options

Data Storage Options

Database (SQL, NoSQL and memory)

Data Orchestration Options

Data Processing

Data Integration & Transformation Tools

Data Warehouse & Analytics Options

Machine Learning Options

Multi-Cloud & Lakehouse Solutions

Data Management and Governance

GCP Data Engineering Architecture and Landscape

GCP Core Fundamentals & Understanding

BigQuery Table Partitioning and Clustering

Example scenario

Unoptimized table behavior

Clustering by a column (example: country)

Partitioning (example: order_date)

Combining partitioning and clustering

Sample DDL for a partitioned and clustered table

Best practices and guidance

Quick example queries

Further reading and references

Watch Video

​Example scenario

​Unoptimized table behavior

​Clustering by a column (example: country)

​Partitioning (example: order_date)

​Combining partitioning and clustering

​Sample DDL for a partitioned and clustered table

​Best practices and guidance

​Quick example queries

​Further reading and references

Watch Video

Example scenario

Unoptimized table behavior

Clustering by a column (example: country)

Partitioning (example: order_date)

Combining partitioning and clustering

Sample DDL for a partitioned and clustered table

Best practices and guidance

Quick example queries

Further reading and references