Event Streaming with Kafka
Building Blocks of Kafka
Kafka Partitions Scaling and Parallelizing Your Data Processing
In this article, we explore how Apache Kafka partitions enable scalable and parallel data processing. By distributing message streams across multiple partitions within a topic, you can achieve higher throughput, stronger fault tolerance, and more efficient consumer parallelism.
What Is a Kafka Partition?
A Kafka partition is an ordered, immutable sequence of records that continually appends messages. Each topic in Kafka is split into one or more partitions, and each partition is stored on one or more brokers for redundancy.
Key characteristics:
- Immutable log: New records are appended to the end.
- Offset-based retrieval: Consumers read messages by offset.
- Per-partition ordering: Kafka guarantees ordering only within a single partition.
Note
Ordering is guaranteed per partition, not across the entire topic. If strict global ordering is required, use a single partition—though this limits parallelism.
Benefits of Partitioning
Benefit | Description | Example |
---|---|---|
Increased Throughput | Parallel writes and reads across multiple partitions | Produce 10,000 messages/sec to a 6-partition topic |
Fault Tolerance | Replicate each partition across brokers for high availability | Set replication-factor=3 for three-broker redundancy |
Scalability | Add partitions on-the-fly to handle more data without downtime (rebalancing) | Expand from 4 to 8 partitions during off-peak hours |
Consumer Parallelism | Multiple consumers in a group can read from different partitions simultaneously | A 5-member consumer group on a 5-partition topic |
Creating a Topic with Multiple Partitions
Use the Kafka CLI to create topics with a specified number of partitions and replication factor:
kafka-topics \
--create \
--bootstrap-server kafka1:9092 \
--topic orders \
--partitions 6 \
--replication-factor 3
Partitioning Strategies
- Key-Based Partitioning
Messages with the same key are routed to the same partition, preserving ordering for that key.ProducerRecord<String, Order> record = new ProducerRecord<>("orders", orderId, order); producer.send(record);
- Round-Robin (Default for No Key)
If no key is provided, Kafka distributes messages in a round-robin fashion across partitions. - Custom Partitioner
Implementorg.apache.kafka.clients.producer.Partitioner
for bespoke logic—useful for geographic or priority-based routing.
Scaling Consumer Parallelism
Consumers join a consumer group to share the workload of reading from a topic’s partitions:
- Each partition is consumed by only one consumer in a group.
- If there are more consumers than partitions, the extra consumers remain idle.
- If there are fewer consumers than partitions, some consumers read multiple partitions.
# Start a consumer group member
kafka-console-consumer \
--bootstrap-server kafka1:9092 \
--topic orders \
--group order-processors
Partition Count | Consumer Count | Parallelism Achieved |
---|---|---|
3 | 3 | 3-way parallelism |
3 | 5 | 3-way (2 idle) |
8 | 4 | 4-way |
Warning
If you reduce partitions after data is produced, you cannot decrease the count — you can only increase it. Plan your partition count based on projected throughput.
Best Practices for Partition Management
- Start with a realistic partition count based on expected throughput and consumer count.
- Use keys for messages where ordering or data locality matters.
- Monitor partition sizes and consumer lag via tools like Kafka Metrics or Confluent Control Center.
- Rebalance consumer groups carefully to avoid extended processing pauses.
- Match the replication factor to your desired fault-tolerance level (commonly ≥ 3).
References
Watch Video
Watch video content