Apache Kafka® on Kubernetes®

Published in

Kubernauts

12 min readApr 25, 2019

“By believing passionately in something that still does not exist, we create it.
The nonexistent is whatever we have not sufficiently desired.”
― Franz Kafka

In this first post, I’ll walk you through the steps to deploy Apache Kafka with Strimzi Kafka Operator on 2 different Kubernetes Clusters for production and provide replication between 2 Kafka clusters with Kafka MirrorMaker and use our Jmeter Cluster on Kubernetes Implementation with Pepperbox Plugin to run smoke and performance tests on our Kafka clusters. As the storage backend we’ll use Container Attached Storage (CAS) implementations with ROOK-CEPH and OpenEBS.

N.B.: We’re using Rancher Kubernetes Engine on Bare-Metal, but this guide should work on any other k8s clusters on public cloud as well and with some minor modifications on minikube or minishift.

If you’d like to deploy RKE with a single command on AWS with TK8, this post by the awesome Shantanu Deshpande is for you.

In the second post, we’ll integrate Confluent’s KSQL, Schema Registry and Rest Proxy Open Source Components with Strimzi’s Kafka implementation and showcase how the Cloud Native CAS Technology with OpenEBS and ROOK_CEPH can be used together with VMware Heptio’s Velero and Restic for Backup, Restore and Disaster Recovery scenarios and provide some insights about Kafka Tooling and Administration, which is about operational knowledge covering topics operation, partition and offset management, dynamic configuration changes and monitoring and alerting. We’ll provide some best practices for Chaos Engineering for Kafka on Kubernetes in the 3rd post.

Let’s Start

The following figure provides a very simplified system architecture overview of the Strimzi Kafka {1,2} running on Rancher Kubernetes Engine {1} cluster and Strimzi Kafka {3} running on Rancher Kubernetes Engine {2} cluster, both RKE clusters are managed through Rancher Infra Management Cluster running Jmeter Cluster on Top in our Lab. RKE1 uses ROOK-CEPH and RKE2 has OpenEBS beneath. The Kafka MirrorMaker is responsible for mirroring and replication of data from Kafka1 to Kafka 2 cluster on the same RKE1 cluster and to Kafka3 running on RKE2 cluster.

Introduction to Apache Kafka

Apache Kafka® was developed by an awesome team at LinkedIn, which was open sourced back in 2011. Apache Kafka is among the fastest growing open source projects and according to Confluent Inc., the awesome people behind Apache Kafka, it is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies.

Apache Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, and can be used for Stream Processing, as a Storage or Messaging System.

Kafka is primarily a distributed, horizontally-scalable, fault-tolerant, commit log. A commit log is basically a data structure that only appends. No modification or deletion is possible, which leads to no read/write locks, and the worst case complexity O(1).

A streaming platform has three key capabilities:

Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
Store streams of records in a fault-tolerant durable way.
Process streams of records as they occur.

Kafka is generally used for two broad classes of applications:

Building real-time streaming data pipelines that reliably get data between systems or applications
Building real-time streaming applications that transform or react to the streams of data

Kafka has four core APIs:

The Producer API allows an application to publish a stream of records to one or more Kafka topics.
The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

To learn more about Apache Kafka, please refer to the official Introduction to Apache Kafka and related resources in Awesome Kafka Resources.

Why Kafka on Kubernetes?

The simple answer to this question is not “why not”, you should!, in case you’d like to enjoy the resiliency nature and scalability capabilities of a cluster and container orchestrator such as Kubernetes. But be aware Kubernetes is not for everyone (yet) and is very complex and Kafka Operation is not easy, but K8s Operators can help to address the Day 2 problem.

Kubernetes is about Resiliency and Scale, Kafka too! Kafka is Stateful, Kubernetes too! That’s a match made in heaven.

Running and operating Stateful apps on Kubernetes was not easy till now, at least if you’re going to deal with replication and have to take care of syncing and re-balancing your streaming data on different nodes and / or different clusters in different regions and have to use an older version of K8s or OpenShift, e.g. OpenShift 3.9, that would be a nightmare if you’re using local storage or very slow distributed shared storage like GlusterFS.

About Strimzi: Kafka as a Service

Strimizi is the upstream version of Red Hat AMQ Streams, based on Apache Kafka 2.1.0 at this time of writing in April 2019. Strimzi implements the Kafka Cluster Operator to deploy and manage Upstream Kafka Broker and Zookeeper nodes along with Kafka Connect Clusters.

Strimzi Operator makes it so easy to run Apache Kafka on Kubernetes and it feels like to get Kafka as a Service!

To learn more about Strimzi, please refer to the official page on strimzi.io.

About Confluent Platform OSS and Enterprise Edition

Confluent Inc., the company that has commercialised Apache Kafka, is the major contributor to Apache Kafka Upstream project and provides additional client libraries and components such as KSQL, Rest Proxy, Schema registry and the Confluent Control Center (C3) with some glue code which glues all the components as Confluent Platform for use in production.

Strimzi vs. Confluent OS vs. Confluent Enterprise

The following table provides a simple comparison between Strimzi, Confluent OSS and Confluent Enterprise.

Deploying the Strimzi Kafka Cluster Operator on Kubernetes

Strimzi provides many options to deploy Apache Kafka on Kubernetes or OpenShift, the easiest option is using Helm to deploy the Kafka Cluster Operator and then use the Operator to deploy Kafka Brokers and Zookeepers along with a TLS Sidecar in each pod.

N.B.: you can use minikube, minishift or any other Kubernetes cluster (> k8s 1.9 or OpenShift 3.9 version) to deploy Apache Kafka with the Strimzi Operator. And you need to have helm tiller installed on Kubernetes.

Preparations

In our scenario we’re using the rook-ceph operator on the first cluster RKE1 Strimzi Kafka1 cluster to provide cloud-native storage for kafka with rook. On the second cluster we’re using OpenEBS, the following section provides the deployment steps for rook. The deployment steps for OpenEBS will be provided in the second post of this series.

Deploy ROOK-CEPH

Sources on Github

Please clone the repo:

$ git clone https://github.com/arashkaffamanesh/kafka-on-kubernetes

Create the rook storage class on RKE1 cluster:

$ k create -f rook-storageclass.yaml

Define the rook storage class as default:

kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{“storageclass.kubernetes.io/is-default-class":"true"}}}'

Deploy the rook operator:

$ k create -f rook-operator.yaml

Verify if the rook-ceph-operator is properly deployt:

$ k get pods -n rook-ceph-system

Deploy the rook-ceph cluster:

$ k create -f rook-ceph-cluster.yaml

Verify if the rook-ceph cluster is properly deployt:

Deploy the rook-ceph toolbox:

$ k create -f rook-ceph-toolbox.yaml -n rook-ceph

You can use ceph commands to verify the status of your rook-ceph cluster.

Clean Up

Clean up the rook-ceph cluster and the operator, if something doesn’t work or you don’t need it anymore:

$ k delete -f rook-ceph-cluster.yaml$ k delete -f rook-operator.yaml

And on the storage / worker nodes clean up the mounted flex volumes (RKE Ubuntu specific):

$ sudo rm -rf /usr/libexec/kubernetes/kubelet-plugins/volume/exec/*

Deploy Strimzi Kafka Cluster Operator on the 1st Cluster

For testing purposes we create 3 namespaces, strimzi, kafka1 and kafka2 on the first cluster and deploy the Strimzi Cluster Operator in the strimzi namespace as following:

$ k create ns strimzi$ k create ns kafka1$ k create ns kafka2$ k config set-context <your cluster context> --namespace=strimzi

Or simply use kn, if you have kubectx installed.

$ kn strimzi$ helm search strimzi/ --versions$ helm repo add strimzi http://strimzi.io/charts/$ helm install strimzi/strimzi-kafka-operator \
  --name strimzi-cluster-operator \
  --namespace strimzi \
  --version 0.11.1

N.B.: the latest strimzi version at this time of writing is 0.11.2, we have chosen to deploy the previous version 0.11.1 to see later how the upgrade process to 0.11.2 works.

After some few seconds your strimzi cluster operator should be running in the namespace strimzi:

Now we have to ask helm to upgrade the cluster operator with the latest version 0.11.2 and allow watching kafka1 and kafka2 namespaces to deploy our kafka clusters in the next step:

$ helm upgrade --reuse-values --set watchNamespaces="{kafka1, kafka2}" strimzi-cluster-operator strimzi/strimzi-kafka-operator

After running the above command you’ll get something similar to this at the end:

Thank you for installing strimzi-kafka-operator-0.11.2To create a Kafka cluster refer to the following documentation.http://strimzi.io/docs/0.11.2/#kafka-cluster-str

N.B.: Helm re-creates the cluster operator and we’re ready to deploy our first Kafka Cluster.

Deploy the Kafka1 Cluster

Now we switch to kafka1 namespace and deploy our first kafka1 cluster into this namespace:

$ kn kafka1$ k apply -f kafka1-cluster.yamlkafka.kafka.strimzi.io/kafka1 created$ k get kafkas
or
$ k get kNAME AGE
kafka1 45s

But let’s see what’s going on and have a look on our first Kafka Cluster Definition, which is a Custom Resource Definition from kind Kafka. We define the kafka version (2.1.0), the number of kafka brokers and zookeepers through the replicas count (3 each), the number of our topic partitions (3) and the storage size (5Gi for brokers and 2Gi for zookeeper PVs).

As you can see in the following screenshots strimzi cluster operator deploys 3 kafka brokers, 3 zookeepers and the entity operator along with 6 PVs and PVCs:

The Entity Operator Pod has 3 containers, the topic-operator, the user-operator and the tls-sidecar:

The Kafka and Zookeeper Pods have 2 containers each, the kafka-{0,1,3} pods have the kafka broker each and the tls-sidecar and the zookeeper-{0,1,3} pods have zookeeper and the tls-sidecar:

Deploy the Kafka2 Cluster

Similar to the steps above we create the Kafka2 Cluster in about 90 Seconds :-)

$ k create -f kafka2-cluster.yaml -n kafka2

This demonstrates the power of Strimzi Kafka Cluster Operator to provide Kafka as a Service on Kubernetes!

List your Kafka Clusters

$ k get k --all-namespacesNAMESPACE NAME AGEkafka1 kafka1 2mkafka2 kafka2 2m

Create Topics

Now we are ready to create our first topic “test” in each cluster, a KafkaTopic is defined as a CRD (Custom Resource Definition), we define our test topic with 3 partitions and 3 replicas as follow:

$ cat kafka1-topic.yaml

And create our first topic in kafka1 cluster and list it:

$ k create -f kafka1-topic.yaml
kafkatopic.kafka.strimzi.io/test created$ k get kafkatopicNAME AGE
test 1m

And create a second test topic in kafka2 cluster which resides in the namespace kafka2 on our RKE1 cluster:

$ k create -f kafka2-topic.yaml -n kafka2
kafkatopic.kafka.strimzi.io/test created$ k get kafkatopic -n kafka2
or
$ k get ktNAME AGE
test 26s

To see our topic in the kafka broker, we can exec into kafka0 container and list our first kafka0-log which has 3 replicas / partitions:

$ k exec -it kafka1-kafka-0 --bash[kafka@kafka1-kafka-0 kafka]$ ls -la /var/lib/kafka/data/[kafka@kafka1-kafka-0 kafka]$ ls -la /var/lib/kafka/data/kafka-log0/…

Do exec into other brokers to see the test logs in the partitions.

Write messages into the topic with the producer. After running the below command, you shall get a prompt “>” and are ready to write your first messages into the broker:

$ kubectl run kafka-producer -ti --image=strimzi/kafka:latest-kafka-2.1.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka1-kafka-bootstrap:9092 --topic test>test1
>hello kafka
>I love kafka :-)

And read the messages in another terminal from beginning with the kafka consumer console:

$ kubectl run kafka-consumer -ti --image=strimzi/kafka:latest-kafka-2.1.1 --rm=true --restart=Never --bin/kafka-console-consumer.sh -- bootstrap-server kafka1-kafka-bootstrap:9092 --topic test --from-beginning>test1
>hello kafka
>I love kafka :-)

Only for fun, to see one of the messages written in the log, you can exec again into kafka0 container and run:

$ cat /var/lib/kafka/data/kafka-log0/test-0/00000000000000000000.log
��
I love kafka :-)

Testing with KafkaCat

Create the kafkacat deployment:

$ k create -f kafkacat.yaml$ k exec -it kafkacat-<tab does the work> --bash$ for i in `seq 1 10`; do echo “hello kafka world” | kafkacat -P -b kafka1-kafka-bootstrap:9092 -t test; done

In another terminal run:

$ k exec -it kafkacat-<tab> --bash$ kafkacat -C -b kafka1-kafka-bootstrap:9092 -t test

Yahoo Kafka Manager GUI for Strimzi

There is an unmaintained Helm Chart for Yahoo Kafka Manager (YKM), which doesn’t work out of the box with Strimzi, the main reason is that YKM needs to connect to zookeeper and strimzi doesn’t allow to connect to zookeeper directly for some intentional good reasons, described by the awesome Jakub Scholz.

Jakub provides a back-door for zookeeper with this zoo-entrance deployment, which makes it possible to get YKM running.

In our scenario for kafka1 cluster, you need to deploy the zoo-entrance with:

$ k create -f zoo-entrance.yaml$ git clone https://github.com/arashkaffamanesh/kafka-manager-chart.git$ cd kafka-manager-chart/$ helm install -n ykm .$ k port-forward ykm-<tab does the work :-)> 8080:9000

And call http://127.0.0.1:8080, you should get something similar to this:

Deploy MirrorMaker

Strimzi provides a CRD for mirror maker which you can use to setup mirroring between the source / consumer (that is the bootstrap server where the clients connect and the messages come in) and the target / producer (in our case kafka2 cluster, where the messages are replicated).

To deploy the mirror maker, please run:

$ k create -f kafka-mirror-maker.yaml

To test if mirror maker is working properly, you shall write messages to the kafka1 cluster and read them from kafka2 cluster:

Write messages to kafka1:

$ kubectl exec -ti kafka-producer -- bin/kafka-console-producer.sh --broker-list kafka1-kafka-bootstrap:9092 --topic test

Read messages from kafka2 (on the same k8s cluster):

$ kubectl exec -it kafka-consumer -- bin/kafka-console-consumer.sh --bootstrap-server kafka2-kafka-bootstrap.kafka2.svc.cluster.local:9092 --topic test

Load Testing Kafka with Apache Jmeter Cluster and Pepperbox

We build now a Jmeter Cluster on top of our RKE Infra Cluster and use it to produce small 4k messages into the kafka brokers. To learn about our Jmeter implementation, please head to this post by Christopher Adigun and learn how Jmeter Cluster works on K8s. Deploying Jmeter Cluster on K8s is very easy:

$ git clone https://github.com/cloudssky/jmeter-kubernetes.git
$ cd jmeter-kubernetes && chmod +x jmeter_cluster_create.sh
$ ./jmeter_cluster_create.sh

Now you’ll be asked to provide a tenant, a tenant is a namespace, please provide “loadtesting” for the namespace and press enter, your jmeter cluster should be ready in about a minute.

Start your smoke test on Kafka1 Cluster

To start load testing, you shall switch to the loadtesting namespace, run the start_test.sh script and provide the pepper_box.jmx for testing.

$ kn loadtesting
$ ./start_test.sh
Enter path to the jmx file:

Use Kafkacat as described above to see if your messages are arriving in kafka1 and kafka2 clusters (via MirrorMaker).

Clean Up your Kafka Clusters


$ kn kafka1$ k delete k$ kn kafka 2$ k delete k

N.B.: this will delete your PVCs as well, do it with caution and set deleteClaim: false in your kafka cluster yaml file. If you have set deleteCalim to false, be aware, you need to delete your PVCs with:

$ k delete pvc -l strimzi.io/cluster=kafka1

Delete the Cluster Operator

To delete the cluster operator with helm, you need to run:

$ helm delete --purge strimzi-cluster-operator

Questions or Feedback?

Feel free to ask any questions or provide feedback by commenting here or joining us on Kubernauts Slack.