KodeKloud Notes

In this lesson, we explore Kafka serialization—why it’s essential, the most popular data formats, and how to configure serializers on the producer side. By the end, you’ll understand how to convert structured data into byte arrays that Kafka can store and transmit efficiently.

Why Serialization Is Necessary

Kafka persists messages as raw bytes (byte[]). Any structured payload—JSON, XML, Avro, Protobuf, etc.—must be serialized before sending. Without serialization:

Kafka cannot store or index your data.
Consumers cannot deserialize or interpret the payload correctly.

Consider this simple JSON record:

{
  "user_id": 123,
  "event": "page_view"
}

Before sending to Kafka, the producer transforms it into a byte array using a JSON serializer.

Note

Kafka ships with built-in serializers for primitive types and strings. For advanced binary formats (Avro, Protobuf), you’ll often integrate a schema registry.

Common Serialization Formats

Below is a comparison of widely used Kafka serialization formats:

Format	Characteristics	Use Case
JSON	Human-readable, text-based	Web APIs, simple logging
Apache Avro	Compact binary, schema evolution, fast parsing	Big-data pipelines (Hadoop, Spark), Confluent Schema Registry
Protocol Buffers	High-performance binary, strict schema	Microservices, cross-language RPC
MessagePack	Binary JSON, more compact than JSON	Lightweight services, IoT

How Producer-Side Serialization Works

When you create a Kafka producer, you must specify serializers for both the message key and the message value. The serializer choice depends on the data type:

Data Type	Serializer Class
Integer key	`org.apache.kafka.common.serialization.IntegerSerializer`
String value	`org.apache.kafka.common.serialization.StringSerializer`
Avro object	`io.confluent.kafka.serializers.KafkaAvroSerializer`
Protobuf object	`io.confluent.kafka.serializers.protobuf.KafkaProtobufSerializer`

Java Producer Configuration Example

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer",   "org.apache.kafka.common.serialization.IntegerSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<Integer, String> producer = new KafkaProducer<>(props);
ProducerRecord<Integer, String> record =
    new ProducerRecord<>("user-events", 123, "{\"user_id\":123,\"event\":\"page_view\"}");
producer.send(record);
producer.close();

Behind the scenes:

Key Serializer converts 123 → byte[].
Value Serializer converts the JSON string → byte[].

Warning

Using mismatched serializers and deserializers (for example, sending Avro with a JSON deserializer) will cause runtime errors. Always keep producer and consumer schemas in sync.

Benefits and Trade-Offs

Compatibility & Security
Serialized data integrates with Kafka’s ACLs, SSL/TLS encryption, and Confluent Schema Registry.
Efficiency
Binary formats (Avro, Protobuf) are more compact and faster to parse than text-based formats.
Producer Overhead
Serialization adds CPU and latency on the producer side. Batch or buffer records to optimize throughput in high-scale environments.

Next Steps

Now that you understand serialization in Kafka, explore how message keys influence partitioning and ordering:

Links and References

Watch Video

Watch video content