DP-900: Microsoft Azure Data Fundamentals

Semi Structured Data

Introduction

Modern e-commerce platforms like Amazon handle millions of transactions per day across a global customer base. Traditional relational databases often struggle under this load due to:

  1. Multiple table updates per transaction, which adds latency when enforcing ACID constraints.
  2. A rigid schema requiring every row to share the same columns, which doesn’t adapt well to a diverse product catalog.

The image discusses the challenges of using structured data for online transactions, highlighting issues like the need for uniform table rows and the complexity of maintaining transaction integrity. It suggests that semi-structured data can address these issues by reducing the need for frequent updates and speeding up processing.

What Is Semi-Structured Data?

Semi-structured data models store each transaction—including customer, order items, and shipping details—in a single record. This approach:

  • Requires only one insert per transaction.
  • Eliminates foreign-key joins and multi-step ACID enforcement.
  • Scales effortlessly to millions of transactions daily.

The image illustrates a solution for handling semi-structured data, highlighting benefits such as storing all data for a transaction in a single row, fewer updates, no relationships to maintain, no transaction integrity issues, and no table schemas.

Each record is self-describing (similar to JSON or XML), allowing fields to vary by record. For example, digital goods can omit shipping address fields.

Benefits of Semi-Structured Storage

  • Schema Flexibility: Add or remove fields per record without downtime.
  • Single-Row Transactions: Faster inserts, no complex joins.
  • High Throughput: Ideal for high-volume, write-heavy workloads.

Trade-offs and Limitations

While semi-structured storage (e.g., Azure Table Storage) improves write performance, it introduces key limitations:

Warning

Azure Table Storage is a NoSQL (non-relational) database. It does not support SQL joins, rich secondary indexes, or transactions across multiple entities.

FeatureRelational DatabaseAzure Table Storage
SchemaFixed, enforcedFlexible, per-entity
JoinsSupportedNot supported
Query LanguageSQLOData/NoSQL REST APIs
IndexingRich secondary indexesPartitionKey & RowKey only
Data DuplicationMinimalHigher (denormalization)

The image illustrates a concept of semi-structured data with three entities: Customer, Sales Order, and Product, highlighting the lack of relationships between them and noting it as less flexible than a relational database.

The image illustrates semi-structured data categories (Customer, Sales Order, Product) and notes that SQL cannot be used for querying or updating, referring to Microsoft's Table Storage as a "NoSQL" database solution.


Hybrid Data Architecture

To balance performance and query flexibility, most organizations adopt a hybrid approach:

  • NoSQL tables (e.g., Azure Table Storage) for high-volume, schema-flexible transactions.
  • Relational databases for shared entities requiring rich querying and indexing (customer profiles, product catalogs).

The image illustrates an ideal solution combining companies, a relational database, and a customer table to handle shared information. It emphasizes using relational databases to store customer information.

Example Record

Below is a sample semi-structured transaction stored in Azure Table Storage:

{
  "PartitionKey": "Orders",
  "RowKey": "2023-10-01_0001",
  "order": {
    "customer": {
      "Name": "Peter Vogel",
      "Address": "London, Ontario"
    },
    "product": {
      "Name": "Widget",
      "Price": 15.10
    }
  }
}

Note

In Azure Table Storage, you use PartitionKey and RowKey to distribute and query your data efficiently.


Watch Video

Watch video content

Previous
Roles and Responsibilities