Skip to main content
Welcome — in this lesson you’ll learn what BigQuery federated queries are, when to use them, and how they work in practice. A federated query lets BigQuery read external data at query time and combine it with native BigQuery tables without first ingesting the data. This is ideal for quick, ad-hoc analysis and multi-source joins when building and maintaining an ETL pipeline is impractical.
A slide titled "BigQuery Federated Queries" showing a simple diagram of a user querying a BigQuery icon and getting results. To the right are bullet points saying data may reside outside BigQuery, might not be movable, and asking if it can still be queried together.
Key points
  • Federated queries let BigQuery read external data sources at query time and join that data with native BigQuery tables.
  • Supported external sources include Cloud Storage (CSV, JSON, Avro, Parquet, ORC), Google Drive / Google Sheets, Cloud SQL (MySQL/PostgreSQL) and Cloud Spanner via BigQuery connections, Cloud Bigtable, and other sources exposed through connectors.
  • Conceptually similar to query engines like Presto that present a single SQL interface across multiple data sources.
Supported external sources (summary)
External resourceAccess method / formatsTypical use case
Cloud StorageCSV, JSON, Avro, Parquet, ORC via external table OPTIONS(uris=...)Query logs or exported files without loading them into BigQuery
Google Sheets / DriveExternal tables with GOOGLE_SHEETS driverBlend client-provided spreadsheets with production datasets
Cloud SQL / Cloud SpannerBigQuery Connections + federated queries or EXTERNAL_QUERY()Ad-hoc analysis of transactional DBs (use cautiously)
Cloud BigtableConnector for large, low-latency key-value dataAnalytical joins with time-series or wide-column data
When to use federated queries
  • Short-lived, ad-hoc analysis where ingesting data into BigQuery is slower or unnecessary.
  • Prototyping, proof-of-concept, or exploratory analysis to validate hypotheses before committing to ingestion and ETL.
  • Blending small client-supplied datasets (for example, a Google Sheet) with a production BigQuery dataset to avoid building a pipeline for transient data.
Important cautions
Federated queries can increase latency and may generate charges from both BigQuery and the external data source. Querying transactional systems (Cloud SQL or Cloud Spanner) can affect production performance. For recurring, large-scale analytics, ingesting data into BigQuery is generally more performant and cost-effective.
How it works (high-level)
  • Define an external table or create a BigQuery Connection that references the remote data source.
  • When your SQL references that external table, BigQuery reads the source data at query time and processes it together with native tables.
  • External data is not persisted in BigQuery unless you explicitly load it into a native table.
Example: define and query a Cloud Storage external table
  • Create an external table that points to CSV files in Cloud Storage, then join it with a native table:
-- Create an external table that reads CSV files directly from Cloud Storage
CREATE EXTERNAL TABLE `mydataset.external_orders`
OPTIONS (
  format = 'CSV',
  uris = ['gs://my-bucket/orders/*.csv'],
  skip_leading_rows = 1
);

-- Query and join with a native BigQuery table
SELECT t.order_id, t.total_amount, e.region
FROM `myproject.mydataset.transactions` AS t
LEFT JOIN `myproject.mydataset.external_orders` AS e
  ON t.order_id = CAST(e.order_id AS INT64)
WHERE t.event_date >= '2024-01-01';
Example: query a Google Sheet as an external table
  • Create an external table with GOOGLE_SHEETS format and query it in place (useful for small, client-managed sheets).
Cost and performance tips
Federated queries are excellent for quick results and small-to-medium datasets. For high-frequency queries or large datasets, prefer loading data into BigQuery to reduce latency and cost. Monitor query plan and bytes processed, and avoid scanning unnecessary columns or files in the external source.
Best practices
  • Limit the scanned data: use file filtering (for Cloud Storage URIs), column projection, and WHERE clauses to reduce bytes read.
  • Use federated queries for prototyping and low-frequency use; move stable, high-volume data into BigQuery for repeated analysis.
  • For Cloud SQL or Spanner, consider creating read replicas or exporting snapshots if analytical queries would impact production performance.
  • Track and monitor costs that may come from both BigQuery processing and the external source (e.g., Cloud SQL I/O).
Links and references Summary BigQuery federated queries let you query and join external data where it lives without first ingesting it into BigQuery. They’re powerful for ad-hoc analysis, prototyping, and blending external client data, but consider latency, cost, and the potential impact on source systems when using them. See you in the next lesson.

Watch Video