Site icon CentLinux

How to install Apache Kafka on CentOS 8

Share on Social Media

Learn how to install Apache Kafka on CentOS 8 with this step-by-step guide. Covering prerequisites, installation commands, and configuration tips, you’ll have Kafka up and running in no time. #centlinux #linux #kafka

What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform used for building real-time data pipelines and streaming applications. Initially developed by LinkedIn and later open-sourced under the Apache Software Foundation, Kafka is designed to handle high-throughput, low-latency data processing.

Key Features of Apache Kafka:

Core Concepts:

Common Use Cases:

Apache Kafka’s ability to handle large-scale, real-time data streams with high reliability makes it a popular choice for various industries, including finance, telecommunications, retail, and more. Its robust architecture and extensive ecosystem of tools and connectors make it a versatile and powerful platform for modern data-driven applications.

Recommended Online Training: Apache Kafka Series – Kafka Cluster Setup & Administration

Apache Flink and Apache Kafka are both powerful tools for handling large-scale data, but they serve different purposes and are often used together in a complementary manner. Here’s a comparison of the two:

Apache Kafka

Purpose: Kafka is a distributed event streaming platform designed to handle high-throughput, real-time data feeds. It is primarily used for messaging, log aggregation, and real-time data pipelines.

Key Features:

  1. Message Broker: Kafka acts as a high-throughput, fault-tolerant message broker, allowing systems to publish and subscribe to streams of records.
  2. Durability and Fault Tolerance: Kafka ensures data durability by persisting messages on disk and replicating them across multiple brokers.
  3. Scalability: Kafka is designed to scale horizontally by adding more brokers to handle increased load.
  4. Low Latency: Kafka provides low-latency message delivery, suitable for real-time data processing.
  5. Decoupling Systems: Kafka decouples data producers from consumers, enabling independent scaling and evolution of different parts of a data architecture.
  6. Multiple Consumers: A single stream of data can be consumed by multiple applications for various use cases.

Common Use Cases:

Purpose: Flink is a stream processing framework that excels at complex event processing and real-time analytics. It is designed for high-performance, low-latency stream and batch data processing.

Key Features:

  1. Stream Processing: Flink provides robust support for stateful stream processing, allowing for real-time data transformation and analytics.
  2. Low Latency: Flink is optimized for low-latency processing, enabling near-instantaneous analysis and actions on data streams.
  3. Fault Tolerance: Flink’s checkpointing mechanism ensures state consistency and recovery in case of failures.
  4. Scalability: Flink can scale horizontally to handle large volumes of data by adding more nodes to the cluster.
  5. Complex Event Processing: Flink supports complex event processing (CEP) with its powerful event pattern matching capabilities.
  6. Batch Processing: In addition to stream processing, Flink can handle batch data processing, making it versatile for various data workloads.

Common Use Cases:

Comparison Summary

In summary, Apache Kafka and Apache Flink serve distinct but complementary roles in a modern data architecture. Kafka is ideal for real-time data streaming and event storage, while Flink excels at processing and analyzing those streams in real-time. Using them together leverages the strengths of both platforms for building robust, scalable, and real-time data-driven applications.

Environment Specification

We are using a minimal CentOS 8 KVM machine with following specifications.

Read Also: How to install Apache Solr Server on CentOS 8

Update your Linux Operating System

Connect with kafka-01.centlinux.com as root user with the help of a ssh client.

Update installed sofware packages on your Linux operating system. We are using CentOS Linux in this installation guide, therefore, you can use dnf command for this purpose.

# dnf update -y

Check the Linux operating system and Kernel version that was used in this installation guide.

# uname -r
4.18.0-193.28.1.el8_2.x86_64

# cat /etc/redhat-release
CentOS Linux release 8.2.2004 (Core)

Install Java on CentOS 8

Apache Kafka is built using Java programming language, therefore it requires Java Development Kit 8 or later.

JDK 11 is available in standard yum repositories, therefore, you can install JDK 11 by executing following Linux command.

# dnf install -y java-11-openjdk

Install Apache Kafka on CentOS 8

Kafka server is distributed under Apache License 2.0, therefore you can download this software from their offical website.

Apache Kafka Downloads

Copy the URL of your required version of Apache Kafka software from this webpage.

Use the copied URL with wget command to download the Apache Kafka software directly from Linux command line.

# cd /tmp
# wget https://downloads.apache.org/kafka/2.6.0/kafka_2.13-2.6.0.tgz
Download Apache Kafka

Extract downloaded tarball by using tar command.

# tar xzf kafka_2.13-2.6.0.tgz

Now, move the extracted files to /opt/kafka directory.

# mv kafka_2.13-2.6.0 /opt/kafka

Install ZooKeeper on CentOS 8

Current versions of Kafka server requires Zookeeper service for distributed configurations. However, it is mentioned in Kafka documentation that

“Soon, ZooKeeper will no longer be required by Apache Kafka.”

But for now, you have to install Apache Zookeeper service before Kafka server.

Zookeeper binary scripts are provided with Kafka setup files. You can use it to configure ZooKeeper server.

Create a systemd service unit for Apache Zookeeper.

# cd /opt/kafka/
# vi /etc/systemd/system/zookeeper.service

Add following directived in this file.

[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/usr/bin/bash /opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/usr/bin/bash /opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Create Systemd Service for Apache Kafka

Similarly, create a systemd service unit for Kafka server.

# vi /etc/systemd/system/kafka.service

Add following directives therein.

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/jre-11-openjdk"
ExecStart=/usr/bin/bash /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/usr/bin/bash /opt/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target

Enable and start Apache Zookeeper and Kafka services.

# systemctl daemon-reload
# systemctl enable --now zookeeper.service
Created symlink /etc/systemd/system/multi-user.target.wants/zookeeper.service â /etc/systemd/system/zookeeper.service.
# systemctl enable --now kafka.service
Created symlink /etc/systemd/system/multi-user.target.wants/kafka.service â /etc/systemd/system/kafka.service.

Verify the status of Apache Kafka service.

# systemctl status kafka.service
Apache Kafka Server Status

Create a Topic in Apache Kafka Server

Create a topic in your Apache Kafka server.

# /opt/kafka/bin/kafka-topics.sh --create --topic centlinux --bootstrap-server localhost:9092
Created topic centlinux.

To view the details of the topic, you can use run following script at the Linux command line.

# /opt/kafka/bin/kafka-topics.sh --describe --topic centlinux --bootstrap-server localhost:9092
Topic: centlinux        PartitionCount: 1       ReplicationFactor: 1    Configs: segment.bytes=1073741824
        Topic: centlinux        Partition: 0    Leader: 0       Replicas: 0    Isr: 0

Add some sample events in your topic.

# /opt/kafka/bin/kafka-console-producer.sh --topic centlinux --bootstrap-server localhost:9092
>This is the First event.
>This is the Second event.
>This is the Third event.
>^C#

To view all the events that are inserted into a topic, you can execute following script at Linux command line.

# /opt/kafka/bin/kafka-console-consumer.sh --topic centlinux --from-beginning --bootstrap-server localhost:9092
This is the First event.
This is the Second event.
This is the Third event.
^CProcessed a total of 3 messages

Apache Kafka is successfully installed on CentOS / RHEL 8 and the bootstrap server is running at port 9092.

To improve your skills in this area, we recommend that you should read Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale 1st Edition (PAID LINK) by O’Reilly Media.

Final Thoughts

Installing Apache Kafka on CentOS 8 can greatly enhance your data streaming capabilities. By following this guide, you should now have Kafka successfully installed and configured. If you need further assistance or professional support for your Linux server, feel free to check out my Fiverr services for expert Linux server administration: Linux Server Admin.

Exit mobile version