DP-900: Microsoft Azure Data Fundamentals

Analyzing Data

Roles and Responsibilities

In this final section of the Data Analysis module, we’ll define the core roles driving insights and then recap the essential concepts you’ve learned.

Data Analysis Roles

We categorize data analysis stakeholders into three main roles:

RoleFocusKey Responsibilities
Analysts (Users)Diagnostic & predictive insights- Create visualizations
  • Formulate analysis questions
  • Write exploratory scripts (Python, R) | | Administrators| Security, permissions & infrastructure stability | - Grant/revoke user access
  • Schedule & monitor ETL/ELT jobs
  • Perform backups and restores | | Data Engineers| Pipeline architecture & data integration | - Design & build ETL/ELT workflows
  • Integrate heterogeneous data sources
  • Optimize performance and resource configurations |

The image outlines roles and responsibilities for Analysts, Administrators, and Engineers, detailing tasks like applying analysis tools, controlling data access, and building transformation pipelines.

Note

Analysts only need access to their datasets and BI tools, while administrators and engineers require broader interfaces to manage, secure, and optimize pipelines.

Key Takeaways

Here’s a summary of the core concepts, tools, and patterns you’ve learned for effective data processing and analysis.

1. OLAP & Data Processing Patterns

  • Online Analytical Processing (OLAP): Multi-dimensional queries on large datasets for deep insights.
  • ETL vs. ELT
    • ETL: Extract → Transform → Load
    • ELT: Extract → Load → Transform (leverages raw data query engines)

2. Integrated Analytics Platforms

  • Azure Synapse Analytics: Unified analytics service combining data warehousing and big data
  • Azure Databricks: Managed Apache Spark environment for collaborative data engineering and machine learning

3. Batch & Stream Processing

  • Batch Processing: Scheduled jobs for high-volume data transformations.
  • Real-time Streaming: SQL-based analytics against streaming sources (e.g., Azure Event Hubs) for immediate insights.

4. Orchestration Tools

  • Azure Data Factory: Automate and monitor ETL/ELT workflows across on-premises and cloud data stores.

5. Data Lake Querying

  • PolyBase: Query file formats in situ without pre-loading.
  • Synapse serverless SQL: On-demand querying of Data Lake Storage via T-SQL.

The image is a summary of data processing concepts, highlighting ETL processes, Azure Data Factory, and Data Lake support for queries with PolyBase.

6. Common Storage & Querying Options

Storage TypeTechnologiesUse Case
Relational WarehousesStar & Cube Schemas in SynapseStructured reporting & BI
NoSQL StoresApache Hive, HBaseSemi-structured or unstructured data
Ad Hoc AnalyticsAzure Data Explorer (KQL)Exploratory & operational analytics
CategoryExamples
ProgrammingPython, R
Big Data EnginesApache Spark, Hadoop on HDInsight

The image is a summary slide about data analysis tools, mentioning Azure Synapse, programming languages like R and Python, and other tools such as Apache Spark and Hadoop.

Power BI Reporting

  • Report Types: Descriptive, Diagnostic, Predictive, Prescriptive
  • Power BI Desktop: Free authoring environment with integrated Power Query for ETL
  • Power BI Service: Cloud-hosted platform for sharing dashboards, paginated reports, and real-time monitoring

Warning

Carefully manage dataset permissions in Power BI Service to maintain data privacy and security.

This concludes our module on analyzing data. You now have a clear understanding of the roles, responsibilities, and tools that form the foundation of any analytics lifecycle.

Further Reading

Watch Video

Watch video content

Previous
Power BI Service