Event Streaming with Kafka

Kafka Producers Consumers The Message Flow

Serialization in Kafka

In this lesson, we explore Kafka serialization—why it’s essential, the most popular data formats, and how to configure serializers on the producer side. By the end, you’ll understand how to convert structured data into byte arrays that Kafka can store and transmit efficiently.


Why Serialization Is Necessary

Kafka persists messages as raw bytes (byte[]). Any structured payload—JSON, XML, Avro, Protobuf, etc.—must be serialized before sending. Without serialization:

  • Kafka cannot store or index your data.
  • Consumers cannot deserialize or interpret the payload correctly.

Consider this simple JSON record:

{
  "user_id": 123,
  "event": "page_view"
}

Before sending to Kafka, the producer transforms it into a byte array using a JSON serializer.

Note

Kafka ships with built-in serializers for primitive types and strings. For advanced binary formats (Avro, Protobuf), you’ll often integrate a schema registry.


Common Serialization Formats

Below is a comparison of widely used Kafka serialization formats:

FormatCharacteristicsUse Case
JSONHuman-readable, text-basedWeb APIs, simple logging
Apache AvroCompact binary, schema evolution, fast parsingBig-data pipelines (Hadoop, Spark), Confluent Schema Registry
Protocol BuffersHigh-performance binary, strict schemaMicroservices, cross-language RPC
MessagePackBinary JSON, more compact than JSONLightweight services, IoT

How Producer-Side Serialization Works

When you create a Kafka producer, you must specify serializers for both the message key and the message value. The serializer choice depends on the data type:

Data TypeSerializer Class
Integer keyorg.apache.kafka.common.serialization.IntegerSerializer
String valueorg.apache.kafka.common.serialization.StringSerializer
Avro objectio.confluent.kafka.serializers.KafkaAvroSerializer
Protobuf objectio.confluent.kafka.serializers.protobuf.KafkaProtobufSerializer

Java Producer Configuration Example

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer",   "org.apache.kafka.common.serialization.IntegerSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<Integer, String> producer = new KafkaProducer<>(props);
ProducerRecord<Integer, String> record =
    new ProducerRecord<>("user-events", 123, "{\"user_id\":123,\"event\":\"page_view\"}");
producer.send(record);
producer.close();

Behind the scenes:

  1. Key Serializer converts 123byte[].
  2. Value Serializer converts the JSON string → byte[].

Warning

Using mismatched serializers and deserializers (for example, sending Avro with a JSON deserializer) will cause runtime errors. Always keep producer and consumer schemas in sync.


Benefits and Trade-Offs

  • Compatibility & Security
    Serialized data integrates with Kafka’s ACLs, SSL/TLS encryption, and Confluent Schema Registry.
  • Efficiency
    Binary formats (Avro, Protobuf) are more compact and faster to parse than text-based formats.
  • Producer Overhead
    Serialization adds CPU and latency on the producer side. Batch or buffer records to optimize throughput in high-scale environments.

Next Steps

Now that you understand serialization in Kafka, explore how message keys influence partitioning and ordering:


Watch Video

Watch video content

Previous
Demo Kafka Producers