DP-900: Microsoft Azure Data Fundamentals
Analyzing Data
Analyzing Data
In this lesson of the Azure Data Fundamentals course, we'll cover the primary tools and frameworks for analyzing data once it’s been ingested and transformed with Azure Data Factory and stored in Azure Data Lake or Azure Synapse Analytics. You’ll learn how to query data efficiently, leverage open-source engines, and integrate these technologies using Azure HDInsight.
Querying with Azure Synapse Data Explorer and Kusto Query Language
When your data resides in Azure Synapse, Azure Synapse Data Explorer (formerly known as Kusto) is the go-to tool for interactive queries. It uses Kusto Query Language (KQL), which builds on many SQL concepts but offers simplified syntax for telemetry, logs, and time-series data.
// Sample KQL query: Top 5 pages by view count
StormEvents
| where StartTime between (datetime(2021-01-01) .. datetime(2021-12-31))
| summarize Events=count() by State
| top 5 by Events desc
| render columnchart
Note
KQL supports built-in visualizations—you can append | render <chartType>
to your query to instantly produce charts.
Familiarity with SQL eases your transition, but KQL’s pattern matching and time-series operators are unique.
Open-Source Analytics Engines
Azure provides first-class support for popular open-source data engines. You can run adhoc SQL, batch, or streaming workloads at scale.
Engine | Description | Use Case |
---|---|---|
Apache Spark | Fast, in-memory analytics engine for batch and streaming, with SQL, Python, Scala, and R APIs. | Large-scale ETL, machine learning |
Apache Hadoop | Distributed storage and processing framework; integrates with HBase for NoSQL and wide-column. | Batch processing, HDFS compute |
If you prefer writing custom code over SQL or KQL, leverage languages such as R or Python for statistical analysis and data science workflows.
Integrating Tools with Azure HDInsight
When your analytics solution combines multiple open-source frameworks, Azure HDInsight offers a fully managed cluster service. You can mix and match:
Component | Role |
---|---|
Apache Spark | In-memory analytics |
[Apache Hive][hive] | SQL-based data warehousing |
[Apache Kafka][kafka] | Real-time stream ingestion |
Hadoop | Distributed storage & compute |
Note
HDInsight clusters can auto-scale and integrate with Azure Active Directory for secure, enterprise-grade deployments.
Choosing the Right Analytics Platform
As your analytics requirements evolve, consider purpose-built, fully integrated platforms:
- Azure Synapse Analytics – Unified experience for data warehousing, big data, and data integration.
- Azure Databricks – Optimized Apache Spark environment with collaborative notebooks and ML integration.
Links and References
- Apache Spark
- Apache Hadoop
- R Project
- Python
- Azure Synapse Analytics
- Azure Databricks
- Azure HDInsight
- Kusto Query Language Overview
Watch Video
Watch video content