Quality
7 min
Kafka is a community-distributed event streaming platform from Apache. Developers choose Kafka because of its ability to handle up to trillions of events every day, making it highly reliable even for the most demanding enterprise applications.
Kafka was initially designed to be a messaging queue, which is why it is based on an immutable commit log. When LinkedIn made Kafka open source in 2011, it rapidly evolved into the event streaming platform that we know today.
Thanks to its design, Kafka is a robust and durable storage solution that acts as a single source of truth capable of distributing data across countless nodes.
Unlike messaging queues most developers know, Kafka is scalable, fault tolerant, and able to support real-time services across industries. Because of its capabilities, Kafka isn't just used for applications but is often deployed company-wide.
Kafka patterns represent big-picture concepts that you need to align your implementation strategy with in order to make the most of Kafka.
Kafka is a message-based platform, so messages are the primary element. Each message represents a simple pair of key and value, which could be a sensor measurement, meter reading, or user action.
A topic in Kafka helps group related messages together. You can think of a topic like a messaging channel.
Partitions are used to separate topics. A single leader node accepts all read and write requests for a given partition and other nodes replicate what the leader receives, providing fault tolerance.
A producer can choose the topic and partition where their message is published. Because of ordering principles, choosing the right partition is often essential.
Consumers, or subscribers, exist inside of consumer groups and each consumer is assigned to at least one partition. In Kafka, consuming a message does not remove it from the partition.
Instead, the consumer maintains an offset based on how many messages it has consumed. Good offset management is important for fault tolerance and delivery guarantees.
An anti-pattern is a pattern of action that begins with a reasonable assumption or goal but leads down a path that ends up contradicting the exact best practices or goals you set out to accomplish.
To follow is an explanation of common Kafka anti-patterns.
Kafka is often praised for being a durable solution, but in truth, there's no magical way to achieve durability other than to plan for it. Kafka's default values are designed to improve latency and availability.
If you determine that durability is more important, you need to tune it using acks and min.insync.replicas.
So, by default, Kafka does not wait for a disk flush. You have to use replication to make your Kafka implementation durable.
The default value in Kafka is acks=1, which is great for latency. However, acks=all is better for durability.
The latter means that the "leader" will wait for all of the in-sync replicas to acknowledge a given record.
However, the in-sync replicas might only consist of one server, which is why you need to set min.insync.replicas to a higher number (the default is 1).
Another common anti-pattern involves focusing only on "the happy path." For instance, retries is generally the first parameter developers turn to when thinking about how to tackle an issue.
The default value is 0, but if you increase it, it will make the client resend any record that fails to send.
If there's an issue with the retry, you can then use built-in "retries," tuning it all the way up from 0 to infinity, but that then exposes you to another problem: duplication. Multiple retires could cause a message to be sent twice, or more.
So, the answer is to set enable.idempotence to "true," which means the producer will make sure only one copy of each message is written.
However, while this enables you to avoid duplication, it does not protect you from another anti-pattern: not developing an idempotent consumer.
The default setting in Kafka is "at least once," but you can change this to "at most once" or "exactly once."
If you use an aggressive manual commit command, you'll put an undue load on Kafka and it does not make for exactly once outcome.
Your best option is to embrace the "at least once" setting or rely on Kafka streams to successfully manage the "exactly once" setting.
Asking what you're going to do in case of an error is essential to successfully implementing Kafka. Examples of unexpected errors that you need to plan for include a message that cannot be processed or a message that doesn't have the schema you expected.
The best solution is to spend some time brainstorming in advance so you can think up the unexpected scenarios that you should be expecting.
The above represents the most common anti-patterns, but the following should also be on your list to avoid:
Kafka is built on top of simple principles that when combined together allow building a wide range of applications.
Kafka is also an essential component to build reactive systems because it is;
If you're looking to implement Kafka in the most efficient, reliable, and scalable way, you can leave that to Adservio team of professionals.
We help companies build extremely resilient digital experiences, and we can help you make your Kafka implementation a success. Contact us to learn more!