Describes Terraform data sources used to query existing infrastructure, expose attributes for resource configuration, and avoid hard coding provider IDs
Welcome back.In this lesson we’ll explore Terraform data sources — the read-only constructs that let your configuration query and consume existing infrastructure instead of creating it. Data sources are essential when part of your environment is managed outside of your Terraform code, or when you need to discover objects (VPCs, clusters, images, namespaces, etc.) dynamically.By the end of this lesson you will know:
What a data source is and when to use it
How to author data source blocks
How to reference attributes returned by data sources in other resources
Why use data sources?
Avoid hard-coding provider-specific IDs (VPC IDs, ARNs, resource group names).
Resolve dependencies on existing infrastructure that Terraform does not manage.
Make modules and configurations reusable across environments.
When Terraform normally creates and manages resources, it uses resource blocks. But Terraform can also query existing resources — which is crucial when infrastructure already exists or is managed elsewhere (on-premises, manually created, or in another team’s account). Examples include VM images, Kubernetes clusters, cloud databases, storage accounts, or network VPCs.For example, picture a team deploying a new containerized application into an existing Azure Kubernetes Service (AKS) cluster. Instead of re-declaring the cluster in Terraform, you can look it up with a data source and reference the attributes you need (API server endpoint, resource IDs, etc.) to configure deployments, network rules, or outputs.Example: standard Azure resource blocks (these are resource declarations — not data sources):
Instead of re-declaring an AKS cluster, use a data source to look it up:
data "azurerm_kubernetes_cluster" "aks" { name = "prod-cluster" resource_group_name = "prod-group"}
On-premises and multi-provider scenarios
VMware/VMware vSphere: Query datacenters, clusters, resource pools, and datastores before deploying VMs.
AWS: Look up a VPC by tag or name to get its ID for subnet or route table creation.
Kubernetes: Query an existing namespace or service account to attach resources to it.
This lookup happens during terraform plan and terraform apply. Terraform queries the provider API in real time to populate data source attributes, so you avoid hard-coded IDs, ARNs, or other brittle values.Key points about data sources
Feature
Description
Example usage
Read-only
Data sources never create or modify provider-side resources
Query the VPC ID for subnet creation
Dependency resolution
Provide attribute values that other resources depend on
Use an existing database server ID when creating a database schema
Reduce hard-coding
Replace literal IDs and ARNs with discovered attributes
Replace vpc-012345 with data.aws_vpc.prd.id
Data sources let Terraform read existing objects (resources, accounts, namespaces, etc.) and expose their attributes for use elsewhere in your configuration. They are read-only and will not create or change provider-side resources.
A visual summary of the data block and its purposes:
Data source syntax and examplesA data source block uses the data keyword and two labels, similar to how resource blocks are structured:
First label: the provider-specific type (for example, aws_vpc, azurerm_resource_group, kubernetes_namespace)
Second label: your local name (a descriptive identifier, such as prd, dev, app)
Minimal AWS VPC data source example:
data "aws_vpc" "prd" { filter { name = "tag:Name" values = ["prd-vpc"] }}
Then reference it inside another resource, e.g., creating a subnet with the VPC ID returned by the data source:
data "azurerm_resource_group" "dev" { name = "dev-resource-group"}data "kubernetes_namespace" "app" { metadata { name = "customer-app" }}
Each data source supports different arguments (filters, name attributes, metadata blocks, etc.). Always consult the Terraform Registry for the provider-specific data source documentation.Referencing attributes returned by a data sourceAfter Terraform queries a data source, it exposes attributes you can reference elsewhere. The reference format is:data.<TYPE>.<NAME>.<ATTRIBUTE>
data — indicates a data source
<TYPE> — the data source type (e.g., aws_vpc)
<NAME> — your local data source name (e.g., prd)
<ATTRIBUTE> — the attribute you want (e.g., id, cidr_block, arn)
Example JSON-like illustration of attributes from data.aws_vpc.prd:
You can use a single data source multiple times across your configuration. Define it once; Terraform queries it during planning and apply and you can reference its attributes wherever needed.Consolidated examples for multiple providers
# AWS VPC lookup by tagdata "aws_vpc" "prd" { filter { name = "tag:Name" values = ["prd-vpc"] }}# Azure resource group lookupdata "azurerm_resource_group" "dev" { name = "dev-resource-group"}# Kubernetes namespace lookupdata "kubernetes_namespace" "app" { metadata { name = "customer-app" }}
Referencing attributes when creating resources:
# Use VPC ID to create an AWS subnetresource "aws_subnet" "pub" { vpc_id = data.aws_vpc.prd.id cidr_block = "10.0.6.0/24"}
Best practices and tips
Use filters or explicit names in data sources to reduce accidental matches.
Prefer data sources in modules when the module must integrate with already existing infrastructure.
Be mindful of provider API rate limits: data sources are called during plan and apply.
If attributes change outside Terraform, re-running terraform plan will reflect the new values (depending on provider behavior).
Summary
Data sources are read-only constructs that let you query existing infrastructure and expose attributes to your Terraform configuration.
Use data sources to avoid hard-coded IDs and to resolve dependencies on existing resources.
Each provider documents supported arguments and returned attributes in the Terraform Registry — check it for the exact schema.