DP-900: Microsoft Azure Data Fundamentals

Analyzing Data

Analyzing Data

In this lesson of the Azure Data Fundamentals course, we'll cover the primary tools and frameworks for analyzing data once it’s been ingested and transformed with Azure Data Factory and stored in Azure Data Lake or Azure Synapse Analytics. You’ll learn how to query data efficiently, leverage open-source engines, and integrate these technologies using Azure HDInsight.


Querying with Azure Synapse Data Explorer and Kusto Query Language

When your data resides in Azure Synapse, Azure Synapse Data Explorer (formerly known as Kusto) is the go-to tool for interactive queries. It uses Kusto Query Language (KQL), which builds on many SQL concepts but offers simplified syntax for telemetry, logs, and time-series data.

// Sample KQL query: Top 5 pages by view count
StormEvents
| where StartTime between (datetime(2021-01-01) .. datetime(2021-12-31))
| summarize Events=count() by State
| top 5 by Events desc
| render columnchart

Note

KQL supports built-in visualizations—you can append | render <chartType> to your query to instantly produce charts.
Familiarity with SQL eases your transition, but KQL’s pattern matching and time-series operators are unique.


Open-Source Analytics Engines

Azure provides first-class support for popular open-source data engines. You can run adhoc SQL, batch, or streaming workloads at scale.

EngineDescriptionUse Case
Apache SparkFast, in-memory analytics engine for batch and streaming, with SQL, Python, Scala, and R APIs.Large-scale ETL, machine learning
Apache HadoopDistributed storage and processing framework; integrates with HBase for NoSQL and wide-column.Batch processing, HDFS compute

The image lists two open-source tools: Apache Spark, an analytics engine for processing large data using SQL, and Apache Hadoop, which distributes analytic processing across multiple servers and works with HBase.

If you prefer writing custom code over SQL or KQL, leverage languages such as R or Python for statistical analysis and data science workflows.

The image lists open-source tools for data processing, including Apache Spark, Apache Hadoop, and programming languages like R and Python.


Integrating Tools with Azure HDInsight

When your analytics solution combines multiple open-source frameworks, Azure HDInsight offers a fully managed cluster service. You can mix and match:

ComponentRole
Apache SparkIn-memory analytics
[Apache Hive][hive]SQL-based data warehousing
[Apache Kafka][kafka]Real-time stream ingestion
HadoopDistributed storage & compute

The image describes HDInsight as a framework for combining open-source tools like Apache Spark, Apache Hive, Kafka, and Hadoop, with their logos displayed.

Note

HDInsight clusters can auto-scale and integrate with Azure Active Directory for secure, enterprise-grade deployments.


Choosing the Right Analytics Platform

As your analytics requirements evolve, consider purpose-built, fully integrated platforms:

  • Azure Synapse Analytics – Unified experience for data warehousing, big data, and data integration.
  • Azure Databricks – Optimized Apache Spark environment with collaborative notebooks and ML integration.

Watch Video

Watch video content

Previous
Storing Data