KodeKloud Notes

Modern e-commerce platforms like Amazon handle millions of transactions per day across a global customer base. Traditional relational databases often struggle under this load due to:

Multiple table updates per transaction, which adds latency when enforcing ACID constraints.
A rigid schema requiring every row to share the same columns, which doesn’t adapt well to a diverse product catalog.

The image discusses the challenges of using structured data for online transactions, highlighting issues like the need for uniform table rows and the complexity of maintaining transaction integrity. It suggests that semi-structured data can address these issues by reducing the need for frequent updates and speeding up processing.

What Is Semi-Structured Data?

Semi-structured data models store each transaction—including customer, order items, and shipping details—in a single record. This approach:

Requires only one insert per transaction.
Eliminates foreign-key joins and multi-step ACID enforcement.
Scales effortlessly to millions of transactions daily.

The image illustrates a solution for handling semi-structured data, highlighting benefits such as storing all data for a transaction in a single row, fewer updates, no relationships to maintain, no transaction integrity issues, and no table schemas.

Each record is self-describing (similar to JSON or XML), allowing fields to vary by record. For example, digital goods can omit shipping address fields.

Benefits of Semi-Structured Storage

Schema Flexibility: Add or remove fields per record without downtime.
Single-Row Transactions: Faster inserts, no complex joins.
High Throughput: Ideal for high-volume, write-heavy workloads.

Trade-offs and Limitations

While semi-structured storage (e.g., Azure Table Storage) improves write performance, it introduces key limitations:

Warning

Azure Table Storage is a NoSQL (non-relational) database. It does not support SQL joins, rich secondary indexes, or transactions across multiple entities.

Feature	Relational Database	Azure Table Storage
Schema	Fixed, enforced	Flexible, per-entity
Joins	Supported	Not supported
Query Language	SQL	OData/NoSQL REST APIs
Indexing	Rich secondary indexes	PartitionKey & RowKey only
Data Duplication	Minimal	Higher (denormalization)

The image illustrates a concept of semi-structured data with three entities: Customer, Sales Order, and Product, highlighting the lack of relationships between them and noting it as less flexible than a relational database.

The image illustrates semi-structured data categories (Customer, Sales Order, Product) and notes that SQL cannot be used for querying or updating, referring to Microsoft's Table Storage as a "NoSQL" database solution.

Hybrid Data Architecture

To balance performance and query flexibility, most organizations adopt a hybrid approach:

NoSQL tables (e.g., Azure Table Storage) for high-volume, schema-flexible transactions.
Relational databases for shared entities requiring rich querying and indexing (customer profiles, product catalogs).

The image illustrates an ideal solution combining companies, a relational database, and a customer table to handle shared information. It emphasizes using relational databases to store customer information.

Example Record

Below is a sample semi-structured transaction stored in Azure Table Storage:

{
  "PartitionKey": "Orders",
  "RowKey": "2023-10-01_0001",
  "order": {
    "customer": {
      "Name": "Peter Vogel",
      "Address": "London, Ontario"
    },
    "product": {
      "Name": "Widget",
      "Price": 15.10
    }
  }
}

Note

In Azure Table Storage, you use PartitionKey and RowKey to distribute and query your data efficiently.

Links and References

Watch Video

Watch video content