AWS Certified Developer - Associate
Data Analytics
Exam Tips
In this guide, we highlight crucial points and best practices for the AWS Certified Developer - Associate exam. We cover services such as Kinesis Data Streams, Kinesis Firehose, Kinesis Data Analytics, Amazon Athena, and OpenSearch. Each section includes diagrams to reinforce the concepts visually.
Kinesis Data Streams
Kinesis Data Streams simplifies the real-time collection, processing, and analysis of video and data streams. It is well-suited for use cases like log processing, real-time analytics, and IoT telemetry.
- Data Producers:
Producers send data records to a stream using the PutRecords API. These producers can be applications that leverage the AWS SDK, the Kinesis Producer Library (KPL), IoT devices, mobile devices, or even DynamoDB Streams.
Record Limitations and Throughput:
Each record in the stream must be smaller than 1 MB. Producers can write up to 1 MB per second or 1,000 messages per shard.Partition Keys and Sharding:
When inserting a record, a partition key is mandatory. This key is hashed to determine the shard where the record will be stored. Using the same partition key always results in the same shard assignment, ensuring consistent data distribution.Managing Hot Partitions:
A hot partition occurs when a shard nears its throughput capacity, triggering a "provisioned throughput exceeded" error. To prevent and manage hot partitions:- Choose partition keys with high entropy to promote an even distribution of data.
- Implement retries with exponential backoff.
- Perform shard splitting to boost throughput.
- Merge underused shards to optimize costs.
Warning
Hot partitions can significantly impact your application's performance. Ensure you design your partition keys carefully and monitor shard performance to mitigate throughput issues.
Data Consumption Modes:
There are two primary ways to read data from a Kinesis stream:Fan-Out Consumer:
- Uses the GetRecords API (pull model).
- Shares a throughput limit of 2 MB per second per shard across all consumers.
Enhanced Fan-Out Consumer:
- Uses the SubscribeToShard API (push model).
- Allocates an exclusive 2 MB per second per shard for each consumer, allowing higher throughput with multiple consumers.
- Kinesis Client Library (KCL):
The KCL facilitates the setup of consumers across multiple nodes (EC2 instances, on-premises servers, or Elastic Beanstalk) to process data from Kinesis streams efficiently. Note that only one KCL instance can run per shard. For instance, with five shards, you can have a maximum of five KCL instances. The library also provides important features like checkpointing and workload distribution for improved reliability.
Kinesis Firehose
Kinesis Firehose is a fully managed service that reliably delivers streaming data to destinations such as S3, Redshift, and OpenSearch.
- Key Differences and Features:
- Firehose is designed for direct data delivery, unlike Kinesis Data Streams which is primarily for data ingestion.
- Allows data transformation via AWS Lambda, enabling modifications to your data before storing it in destinations like S3.
- Provides near real-time delivery with automated backup of failed data transfers to an S3 bucket.
Kinesis Data Analytics
Amazon Kinesis Data Analytics empowers you to process and analyze streaming data using standard SQL queries.
- Integration:
- Streaming data sources and sinks must be either Kinesis Data Streams or Kinesis Firehose, ensuring a seamless integration between data ingestion and analytics.
Amazon Athena
Amazon Athena enables interactive querying of data stored in S3 using SQL.
- Supported Formats:
Athena supports multiple data formats including Parquet, JSON, CSV, ORC, and Avro. Once the data is queried, you can leverage Amazon QuickSight to create compelling visualizations.
OpenSearch
OpenSearch is a fully managed service that simplifies the deployment, scaling, and operational management of OpenSearch clusters.
- Modes and Use Cases:
- Supports both managed and serverless cluster modes.
- Ideal for scenarios requiring log analysis or robust search capabilities since queries can be executed across all fields and attributes.
By understanding these services and best practices, you will be well-prepared to tackle the AWS Certified Developer - Associate exam. Each component—whether it is managing data streams, ensuring efficient data delivery, processing streaming data with SQL, or performing ad-hoc queries—plays a key role in building robust AWS applications.
Note
Review each service's unique features and experiment in a lab environment to gain practical experience. This hands-on approach will help solidify your understanding and improve your exam performance.
Watch Video
Watch video content