
How Cloud Logging works (high level)
- Every Google Cloud resource (Compute Engine VMs, GKE, Cloud Functions, App Engine, Cloud Storage, and managed services) can emit logs. Logs arrive via:
- platform integrations,
- agents (for VMs and some container setups), or
- the Cloud Logging API and client libraries.
- Incoming logs are ingested into the Logging service, which provides:
- A log router that filters and routes entries to sinks.
- Sinks that export logs to destinations such as Cloud Storage, BigQuery, Pub/Sub, or other log buckets (including cross-project/organization sinks).
- Log buckets where logs are stored and retention policies are applied.
- For interactive investigation, use Logs Explorer in the Cloud Console. It supports advanced queries, filters, and visualizations.
Why Cloud Logging is powerful
- Low-latency ingestion: Logs appear quickly after they are emitted, enabling near real-time troubleshooting.
- Fully managed: Google handles scaling, availability, and the underlying infrastructure.
- Log-based metrics: Turn specific log patterns into metrics to power alerts and dashboards (e.g., count of failed jobs or application errors).
- Rich querying: Logs Explorer offers a powerful query language and UI for filtering, inspecting, and visualizing trends.
- Audit logs: Capture admin activity, data access, and system events—essential for security investigations and compliance.
Note: Data Access audit logs are not enabled by default for many services due to volume and cost. Enable them intentionally if you need detailed access records for troubleshooting or compliance.
Warning: Log retention and export settings affect cost and compliance. Configure retention policies and export sinks to balance long-term storage needs against billing.
Key components (at-a-glance)
| Component | Purpose | Example use-case |
|---|---|---|
| Log router & sinks | Filter and route log entries to destinations | Export deletes to Pub/Sub to trigger recovery workflows |
| Log buckets & retention | Store logs and enforce retention policies | Archive logs for 365 days in a separate bucket |
| Audit logs | Track admin activity, data access, and system events | Identify who deleted objects from a bucket |
| Log-based metrics | Create metrics from log contents | Alert on repeated job failures |
| IAM roles | Control access to view, write, and manage logs | roles/logging.viewer, roles/logging.logWriter |
Configuration patterns important to data engineers
- Sinks: Use sinks to route audit or application logs to BigQuery for analysis, Cloud Storage for long-term retention, or Pub/Sub for streaming/real-time reaction.
- Retention: Set retention at the log-bucket level to enforce compliance and manage cost.
- Audit Logging: Admin Activity, Data Access, and System Event audit logs give different levels of visibility—enable Data Access selectively.
- Log-based metrics & alerts: Create metrics from log patterns and surface them in Cloud Monitoring to monitor pipeline health.
- IAM granularity: Apply least privilege using logging-specific roles.
Quick examples
Logs Explorer query to find Cloud Storage object deletes:PROJECT_ID and DATASET):
Example scenario: deleted CSV files in a Cloud Storage bucket
- Cloud Storage emits audit logs for object create/delete/read operations.
- Use Logs Explorer to locate
storage.objects.deleteevents and see the principal (user or service account), timestamp, and API call. - For deeper analysis, export audit logs to BigQuery to run SQL queries over historical events.
- For real-time reaction (e.g., automatic recovery), route delete events to Pub/Sub and trigger a Cloud Function or Dataflow pipeline.
Troubleshooting checklist for data engineers
- Verify the relevant audit logs are enabled (Admin Activity vs Data Access).
- Check sink filters and destinations to ensure exports include the desired log entries.
- Confirm IAM: does the user or service account have
roles/logging.vieweror proper sink writer permissions? - Inspect log retention for the bucket where logs are stored—older logs may be expired.
- Use log-based metrics to detect recurring failures before they escalate.