Batch Processing
Batch processing aggregates data over a period, loads it into storage, and runs analytics as a single job. It’s ideal for large-scale reporting when real-time insights aren’t required. Key steps:- Collect data (e.g., all sales transactions across Canada).
- Load into a central data warehouse.
- Execute analytics on a schedule (daily, weekly, monthly).
- Generate dashboards and reports.

| Trigger Type | Description | Example |
|---|---|---|
| Time-based | Runs at fixed calendar intervals | Daily sales report at midnight |
| Event-based | Starts when data reaches a predefined count | Forecast after 5,000 transactions |
| Conditional | Activates on custom business logic | Inventory update on low stock |
Stream Processing
Stream processing ingests and analyzes data continuously, delivering low-latency insights. It’s commonly used for IoT telemetry, clickstreams, and other real-time workloads. Core characteristics:- Continuous data flow (e.g., sensor readings, user events).
- Processing aimed at real time or near real time.
- Rolling time windows to group and analyze recent data slices.

Rolling Time Windows
A rolling time window processes events within a fixed duration that shifts forward by a defined increment. For example, with a 5-minute window that advances every minute:- 5:29 PM: process data from 5:24–5:29
- 5:30 PM: process data from 5:25–5:30
Time windows longer than one second are generally considered near real-time rather than true real-time.
Hybrid Processing
Hybrid processing combines continuous ingestion with scheduled batch analytics. You ingest stream data in real time and also store it for periodic batch processing.
Hybrid Transactional and Analytical Processing (HTAP)
HTAP merges OLTP and OLAP in a single pipeline. With Azure Cosmos DB for high-volume transactions and Azure Synapse Link to feed data directly into Azure Synapse Analytics, you can eliminate separate ETL tasks.
Stream Processing Components
To build a robust streaming pipeline, you need:-
Events
Individual messages from devices or applications (e.g., temperature readings, status updates). -
Ingestion Services
- Azure Event Hubs for large-scale streaming.
- Azure IoT Hub optimized for IoT scenarios.
- Apache Kafka as an open-source alternative.
-
Sinks
Downstream targets such as data lakes, data warehouses, or analytics platforms. -
Analytic Clusters
Scalable compute clusters for resource-intensive, near real-time processing when single-node compute isn’t enough.
