- Which ads drive engagement and conversions?
- Which publishers and partners deliver high-quality traffic?
- What is the ROI for each channel and campaign?
- capture clicks, impressions, sessions, and in-app events in real time;
- attribute conversions to channels and publishers;
- produce campaign- and publisher-level ROI and quality metrics.

- Ingestion layer
- Pub/Sub: Serves as the streaming message bus for clicks, impressions, and app events. It decouples producers from consumers and supports high-throughput, low-latency ingestion.
- Dataflow: Consumes Pub/Sub for streaming transforms (enrichment, deduplication, windowing) using Apache Beam. Also used for batch loads and schema enforcement.
- Batch loaders (Compute Engine / Dataproc / BigQuery load jobs): For scheduled file loads or Spark-based ETL, use Compute Engine or Dataproc. When raw files land in Cloud Storage, trigger load jobs into BigQuery.
- Compute Engine: https://cloud.google.com/compute
- Dataproc: https://cloud.google.com/dataproc
- BigQuery load jobs: https://cloud.google.com/bigquery/docs/loading-data
- Storage and analytical processing
- BigQuery (central warehouse): Store event tables, sessionized views, campaign performance summaries, and ad attribution outputs. BigQuery is serverless and optimized for SQL analytics at scale.
- Cloud SQL: Store relational metadata, publisher and campaign lookup tables, config, and reporting mappings. Use for transactional needs and fast lookups.
- Cloud Storage (GCS): Archive raw event payloads, backups, and large unstructured objects. Use Nearline/Coldline/Archive classes for infrequent access.
- Docs: https://cloud.google.com/storage
- Storage classes: https://cloud.google.com/storage/docs/storage-classes
- Cloud Bigtable: Ideal for high-throughput, low-latency access patterns such as time-series counters or per-user state where a wide-column NoSQL store fits.
- Processing engines
- Dataflow: Recommended for stream-first processing and also supports batch. Best for real-time ETL, enrichment, windowed aggregations, and event-time semantics.
- Dataproc: Managed Hadoop/Spark clusters for existing Spark jobs or when you need specialized Spark libraries and large-scale batch transformations.
Dataflow (Apache Beam) is serverless and performs well for both streaming and batch pipelines. Dataproc provides managed Spark/Hadoop environments that are a good fit when you have existing Spark workloads or require specific Spark libraries.
- Machine learning and advanced analytics
- Vertex AI / TensorFlow: After feature engineering in BigQuery, Dataflow, or Dataproc, train and deploy models using Vertex AI or TensorFlow workflows. Vertex AI supports end-to-end model training, model registry, and serving.
- Vertex AI: https://cloud.google.com/vertex-ai
- TensorFlow: https://www.tensorflow.org
- Presentation and reporting
- BI and dashboards: Analysts query BigQuery and Cloud SQL via Looker, Looker Studio, or other BI tools to produce campaign dashboards, publisher performance reports, and ROI analyses.
- Looker: https://cloud.google.com/looker
- Looker Studio: https://lookerstudio.google.com
- Exports and downstream systems: Materialize aggregates back into Cloud SQL or export to partner APIs and reporting endpoints as needed.
| Layer | Primary services | Typical use cases |
|---|---|---|
| Ingestion | Pub/Sub, Dataflow | Real-time event capture, streaming ETL, windowed aggregation |
| Batch processing | Dataproc, Dataflow (batch) | Spark workloads, heavy aggregations, feature engineering |
| Storage | BigQuery, Cloud Storage, Cloud SQL, Bigtable | Analytics warehouse, raw archives, metadata storage, low-latency counters |
| ML & Serving | Vertex AI, TensorFlow | Model training, feature serving, prediction pipelines |
| BI & Reporting | Looker, Looker Studio | Dashboards, ad-hoc queries, executive reports |
- Client events (web, mobile, gaming) are published to Pub/Sub.
- Streaming pipelines in Dataflow consume Pub/Sub, enrich events (e.g., lookup publisher metadata from Cloud SQL), deduplicate, and write to BigQuery for near real-time analytics. Dataflow can write intermediate datasets to GCS or Bigtable.
- Batch transformations run on Dataproc (Spark) or Dataflow batch jobs to sessionize events, compute aggregates, and produce derived features.
- BigQuery stores raw event tables and aggregated reporting tables for fast SQL queries.
- Cloud SQL holds lookup tables, campaign metadata, and configuration used by ETL and reporting layers.
- Vertex AI or TensorFlow uses datasets from BigQuery or GCS for model training; predictions are materialized into BigQuery or published to downstream systems.
- BI tools read BigQuery and Cloud SQL to surface KPIs, publisher rankings, and ROI calculations.
- Use appropriate GCS storage classes (Nearline, Coldline, Archive) for long-term raw data retention to reduce cost.
- Partition and cluster BigQuery tables based on event timestamps and query patterns to lower query scan costs.
- Prefer managed, serverless services (Dataflow, BigQuery) to minimize operational overhead. Use Dataproc where Spark-specific libraries or ecosystem compatibility is required.
- Keep small metadata and lookup tables in Cloud SQL for transactional access and fast joins when needed, while storing analytical tables and aggregates in BigQuery.
- Monitor Pub/Sub and Dataflow metrics (through Cloud Monitoring) to tune throughput and latency.
- Pub/Sub: https://cloud.google.com/pubsub
- Dataflow (Apache Beam): https://cloud.google.com/dataflow — https://beam.apache.org
- BigQuery: https://cloud.google.com/bigquery
- Dataproc: https://cloud.google.com/dataproc
- Cloud Storage: https://cloud.google.com/storage
- Cloud SQL: https://cloud.google.com/sql
- Cloud Bigtable: https://cloud.google.com/bigtable
- Vertex AI: https://cloud.google.com/vertex-ai
- Looker / Looker Studio: https://cloud.google.com/looker — https://lookerstudio.google.com