DP-900: Microsoft Azure Data Fundamentals

Analyzing Data

Extracting and Transforming

Welcome back to the Azure Data Fundamentals (DP-900) course. In this lesson, we’re focusing on the first two phases of ETLExtract and Transform—using Azure Data Factory (ADF), Microsoft’s cloud-native data integration service.

What Is ETL?

ETL stands for:

  • Extract: Pull data from one or more sources.
  • Transform: Shape, enrich, and standardize that data for analytics or consolidation in a data warehouse.
  • Load: Write the processed data into a target store (covered in a later lesson).

Why ETL Matters

A robust ETL process ensures high-quality, consistent data for downstream analytics and reporting.

Introducing Azure Data Factory

Azure Data Factory provides a visual, no-code environment to build, schedule, and orchestrate ETL pipelines. ADF’s core components include:

  • Pipelines: Logical groupings of activities.
  • Activities: Individual steps such as copying or transforming data.
  • Linked Services: Connection definitions to data stores or compute resources.

The image is an infographic about Azure Data Factory, highlighting its components like Pipelines, Copy data flow, Transform data flow, and Lookup, with brief descriptions of their functions.

Core Concepts

Pipelines

A pipeline is a workflow of one or more activities. Pipelines help you organize and manage your data integration tasks. You can:

  • Schedule pipelines on a regular cadence.
  • Trigger pipelines manually or via events.
  • Monitor pipeline runs and view detailed logs.

Activities

Activities perform discrete operations in your pipeline. Common activity types include:

Activity TypeDescriptionExample Use Case
Copy DataMoves data from a source to a sink with minimal or no transformationCopy files from Azure Blob Storage to Azure SQL
Data FlowNo-code transformations: filter, aggregate, join, derive columnsCleanse and aggregate transactional records
LookupRetrieves a single row or small result set for enrichmentFetch Country from SQL DB by CustomerID

No-Code Transformation

With Data Flow, you get a graphical interface to design complex transformations—no programming required.

Linked Services

Linked services define how ADF connects to data sources and compute environments. Think of them as connection strings or service endpoints.

Linked ServiceUse CaseExample Configuration
Azure Blob StorageStore and retrieve large unstructured data filesconnectionString to your storage account
Azure SQL DatabaseRelational data store for OLTP or lookupsServer name, database name, login credentials
Cosmos DB (Mongo API)NoSQL document databaseAccount endpoint and primary key

Practical Example: Enriching Customer Data

Imagine your sales team uploads an Excel file with only CustomerID and CustomerName. You need to include the customer’s Country in your analytics pipeline:

  1. Copy Data: Ingest the Excel file from Azure Blob Storage into a staging table in Azure SQL Database.
  2. Lookup: Use the CustomerID to query the staging table and retrieve Country from the reference table in SQL Database.
  3. Data Flow: Join and transform the enriched data—filter out inactive customers, rename columns, and aggregate sales by region.
  4. Load: Send the final output to your Azure Synapse Analytics data warehouse (covered later).
{
  "name": "CustomerEnrichmentPipeline",
  "properties": {
    "activities": [
      {
        "name": "CopyExcelToStaging",
        "type": "Copy",
        "linkedServiceName": "AzureBlobStorageLS",
        "inputs": [ { "referenceName": "SalesExcelDataset" } ],
        "outputs": [ { "referenceName": "StagingSqlTable" } ]
      },
      {
        "name": "LookupCountry",
        "type": "Lookup",
        "dependsOn": [ { "activity": "CopyExcelToStaging", "dependencyConditions": [ "Succeeded" ] } ],
        "linkedServiceName": "AzureSqlDatabaseLS",
        "source": {
          "type": "SqlSource",
          "sqlReaderQuery": "SELECT Country FROM Customers WHERE CustomerID = @{activity('CopyExcelToStaging').output.firstRow.CustomerID}"
        }
      }
    ]
  }
}

Next Steps

With pipelines, activities, and linked services in place, you can automate and scale your extract and transform workflows. In the next lesson, we’ll cover the load phase, where you push transformed data into target sinks for reporting and analytics.


Watch Video

Watch video content

Previous
Demo Azure Synapse Analytics