Apache Kafka Message Compression
Basically, our producer usually sends data in the text-based form. For example, most of the time the producers are sending some JSON data. And JSON is text. In this case, it’s important that you apply compression to the producer. JSON is very text heavy and it’s big in size So we must compress it.
Compression types can have multiple values. It can be none, which is a default, no compression, gzip, lz4, and snappy that we have discussed above. Compression is more useful when we send a bigger batch of messages. So the more data you send to Kafka the more compression is going to be helpful. So here’s how it works.
We have our producer batch and a producer batch is basically Kafka batching messages on its own. So it will have Message 1, Message 2, Message 3, up to, Message 100. It’s because our producer sends a lot of messages and it wants to send them altogether if possible. Now the producer batch will get compressed because the producer, before sending the batch to Kafka, will start compressing the batch to make it much smaller. That only happens when you enable compression. Now when we send this to Kafka, well we have a big decrease in size and automatically, sending to Kafka and replicating it across brokers is so much quicker. So you have decreased latency in this size. So that’s why compression is so important. And because you decrease stuff in size and so Kafka brokers have to do less replication, you use less network bandwidth. So the advantages to compress a batch are those.
Apache Kafka – Message Compression
Kafka Producers are going to write data to topics and topics are made of partitions. Now the producers in Kafka will automatically know to which broker and partition to write based on your message and in case there is a Kafka broker failure in your cluster the producers will automatically recover from it which makes Kafka resilient and which makes Kafka so good and used today. So if we look at a diagram to have the data in our topic partitions we’re going to have a producer on the left-hand side sending data into each of the partitions of our topics.
So here is another setting that’s so important which is Message Compression. Before that let’s understand the Kafka Message Anatomy first.