- Discover: Find sensitive data inside CSV files, BigQuery tables, Cloud Storage objects, logs, or user input.
- Classify: Identify the type of data — e.g., email address, passport ID, phone number, or medical info.
- De-identify: Mask, redact, tokenize, or cryptographically transform data so the original values are not exposed.
- Monitor and audit: Keep track of scans, transformations, and re-identification events for compliance.
- Inspect: Analyze content (text, images, structured data) against built-in detectors (150+ predefined info types) and custom detectors.
- Classify: Return findings annotated with the info type, likelihood, and location (offsets or table/column references).
- De-identify: Apply transformations — e.g., replace characters with a mask, tokenize values, or encrypt using Cloud KMS.
- Re-identify: When authorized, reverse a reversible transform (for example, decrypt or detokenize) under strict access controls.
- Asynchronous scanning: For large datasets, create long-running background jobs that persist results and logs.
| Method | Purpose | Typical use case |
|---|---|---|
content.inspect | Scan text, images, or files for sensitive information | Inspecting logs before exporting them to BigQuery |
content.deidentify | Transform findings (mask, redact, tokenize, encrypt) | Hiding all but the last four digits of a credit card |
content.reidentify | Reverse a prior de-identification (when authorized) | Authorized forensic or fraud investigations |
projects.dlpJobs.create | Create long-running DLP jobs for large datasets | Full scans of BigQuery tables or Cloud Storage buckets |
- Use DLP templates (inspect and de-identify templates) to keep detection and masking rules consistent across projects and teams.
- Apply least-privilege access for any re-identification operations; require strong IAM roles and keep an audit trail when original values are restored.
- Prefer
dlpJobs.createfor large, asynchronous scans (BigQuery, Cloud Storage) so scans run in the background without blocking processing pipelines. - Combine detectors: use both built-in and custom regex detectors for domain-specific identifiers.
- Monitor and log DLP jobs, findings, and re-identification events for compliance and incident response.
| Recommendation | Why it matters |
|---|---|
| Use templates | Centralize policies and simplify updates |
| Least privilege | Minimize risk when re-identifying data |
| Asynchronous jobs | Scale to large datasets without impacting latency |
| Audit trails | Demonstrate compliance and detect misuse |
Tip: Create reusable DLP inspect and de-identify templates. These templates ensure consistent policy enforcement across services and make it easier to update masking rules centrally.
Warning: Re-identification restores sensitive values. Restrict who can call
content.reidentify and require logging and approvals. Treat re-identification as a high-risk operation in your security and compliance processes.- BigQuery: Scan tables with
dlpJobs.createfor scheduled or one-off discovery jobs. Use de-identification to create masked export tables. - Cloud Storage: Scan objects (CSV, JSON, text) and optionally redact or move sensitive copies to a protected bucket.
- Logging/Streaming: Inspect logs or streaming data in real-time before they are forwarded to sinks (e.g., Pub/Sub -> BigQuery).
- Images: Use image inspection to detect text-based PII inside images (OCR + info type detectors).
- Workflows: Combine DLP with Cloud Functions or Dataflow to automate inspection and transformation as part of ETL pipelines.
- Google Cloud DLP Documentation
- BigQuery Documentation
- Cloud Storage Documentation
- Cloud KMS Documentation