DP-900: Microsoft Azure Data Fundamentals
Semi Structured Data
Introduction
Modern e-commerce platforms like Amazon handle millions of transactions per day across a global customer base. Traditional relational databases often struggle under this load due to:
- Multiple table updates per transaction, which adds latency when enforcing ACID constraints.
- A rigid schema requiring every row to share the same columns, which doesn’t adapt well to a diverse product catalog.
What Is Semi-Structured Data?
Semi-structured data models store each transaction—including customer, order items, and shipping details—in a single record. This approach:
- Requires only one insert per transaction.
- Eliminates foreign-key joins and multi-step ACID enforcement.
- Scales effortlessly to millions of transactions daily.
Each record is self-describing (similar to JSON or XML), allowing fields to vary by record. For example, digital goods can omit shipping address fields.
Benefits of Semi-Structured Storage
- Schema Flexibility: Add or remove fields per record without downtime.
- Single-Row Transactions: Faster inserts, no complex joins.
- High Throughput: Ideal for high-volume, write-heavy workloads.
Trade-offs and Limitations
While semi-structured storage (e.g., Azure Table Storage) improves write performance, it introduces key limitations:
Warning
Azure Table Storage is a NoSQL (non-relational) database. It does not support SQL joins, rich secondary indexes, or transactions across multiple entities.
Feature | Relational Database | Azure Table Storage |
---|---|---|
Schema | Fixed, enforced | Flexible, per-entity |
Joins | Supported | Not supported |
Query Language | SQL | OData/NoSQL REST APIs |
Indexing | Rich secondary indexes | PartitionKey & RowKey only |
Data Duplication | Minimal | Higher (denormalization) |
Hybrid Data Architecture
To balance performance and query flexibility, most organizations adopt a hybrid approach:
- NoSQL tables (e.g., Azure Table Storage) for high-volume, schema-flexible transactions.
- Relational databases for shared entities requiring rich querying and indexing (customer profiles, product catalogs).
Example Record
Below is a sample semi-structured transaction stored in Azure Table Storage:
{
"PartitionKey": "Orders",
"RowKey": "2023-10-01_0001",
"order": {
"customer": {
"Name": "Peter Vogel",
"Address": "London, Ontario"
},
"product": {
"Name": "Widget",
"Price": 15.10
}
}
}
Note
In Azure Table Storage, you use PartitionKey
and RowKey
to distribute and query your data efficiently.
Links and References
Watch Video
Watch video content