This article provides a hands-on demonstration of setting up Apache Kafka for real-time data streaming using Docker and Python.
Welcome to this hands-on demonstration on Apache Kafka. In this lesson, you will learn how to set up Apache Kafka as an event bus so that one program can publish messages while multiple programs consume them in real time. Kafka acts as a central hub where events from various producers are sent to designated topics. Consumers then subscribe to these topics to retrieve and process messages.In this demonstration, we will:
Create a simple Kafka environment using Docker Compose.
Develop a Python-based Kafka producer that generates messages.
Build a Kafka consumer to read and process those messages.
First, update your system and install the necessary packages for Python 3 and virtual environments. Run the following commands in your KodeKloud playground labs terminal:
Next, configure Kafka and Zookeeper using Docker Compose. Create a file named docker-compose.yaml and paste the content below. This configuration uses the Zookeeper image (required for managing the Kafka cluster) and the Confluent Kafka image that relies on Zookeeper.Please note that a terminal view is provided in the image below for illustration. The file name and its content remain unchanged.
Save the file, then bring up the Kafka environment using:
Copy
Ask AI
(kafka_venv) admin@docker-host:~$ docker-compose up -d
This command pulls the necessary images and starts the containers for Zookeeper and Kafka in detached mode. Verify that the containers are running with:
Copy
Ask AI
(kafka_venv) admin@docker-host:~$ docker container ls
Adjust the number of partitions and the replication factor as needed. These parameters are critical for achieving higher throughput and ensuring fault tolerance in production environments.
Now, let’s create a Python script to produce sample events to our Kafka topic. Open a text editor (e.g., using vim) and create a file named python-kafka-producer.py.An optional terminal screenshot is shown below for visual reference. Follow the written instructions to enter the code.
Develop Python scripts for both producing and consuming messages.
The producer continuously sends messages that include a timestamp and a random value, while the consumer retrieves and prints these messages in real time. This setup shows how Kafka can serve as the central nervous system for data streaming applications, efficiently handling data ingestion and distribution.Happy coding and see you in the next lesson!