Event Streaming with Kafka

Kafka Producers Consumers The Message Flow

Consumer Rebalancing

In this article, we explore Kafka consumer rebalancing, a core feature that ensures continuous, balanced message processing in distributed systems.

Imagine a Kafka cluster with four brokers (1, 2, 3, 4) running a topic named topic A, which has four partitions. A consumer group with four consumers processes one partition per instance. If consumer-4 fails, partition 4 stops receiving messages, potentially blocking critical data streams, such as payment transactions.

Warning

A stalled partition in a payment system can cause transaction delays or data inconsistencies. Ensure you configure rebalancing parameters appropriately.

How Consumer Rebalancing Works

When a consumer joins or leaves the group, Kafka triggers a rebalance:

  1. Consumption Pause
    All active consumers halt message processing.
  2. Partition Reassignment
    Kafka redistributes partitions to achieve an even workload.
  3. Resuming Consumption
    Consumers resume processing with their new assignments.

The image illustrates Kafka consumer rebalancing, showing how partitions from different brokers are distributed among consumers in a consumer group.

Eager vs. Cooperative Rebalancing Protocols

Kafka offers two protocols for partition reallocation:

ProtocolBehaviorUse Case
EagerRevokes and reassigns all partitions at once.Simple logic, but longer processing pause.
CooperativeIncrementally revokes/assigns partitions for minimal downtime.Low-latency environments.

Rebalance Triggers and Group Membership

Rebalancing occurs upon:

  • Consumer Failure: Unexpected crashes or network partitions.
  • Consumer Join: Scaling out with new instances.
  • Consumer Shutdown: Graceful or forced termination.

Note

Kafka’s consumer rebalancing guarantees at-least-once delivery, preventing data loss or duplication during assignments.

The image explains "Consumer Rebalancing" in Kafka, highlighting three aspects: Partition Reassignment, Group Membership Change, and Consumption Pause.

Best Practices

  • Adjust session.timeout.ms and heartbeat.interval.ms for faster failure detection.
  • Prefer the cooperative protocol for latency-sensitive applications.
  • Monitor consumer lag with tools like Kafka Monitor or Confluent Control Center.

References

Watch Video

Watch video content

Previous
Consumer Groups and How They Work