Quality
18 min
Apache Kafka is an open-source platform for reading and writing real-time streaming data. With so many benefits to using big data, solutions like Kafka are invaluable for businesses of all sizes and industries.
When working with streaming data platforms like Kafka, we would like to have "delivery guarantees," i.e. the ability to know how (and how many times) the system will attempt to deliver a message.
So what are Kafka’s delivery guarantees, and how does Kafka work behind the scenes to guarantee this behavior?
Kafka uses a producer-consumer pattern to work with streaming data. Some processes are producers responsible for sending messages, and others are consumers responsible for receiving and processing them.
However, working with real-time streaming data across distributed systems presents unique challenges, some messages may be accidentally lost, or sent multiple times if the consumer does not initially acknowledge receiving them.
We can illustrate these problems with the following analogy. Suppose that you want to send an important letter to someone far away, but you aren’t certain whether the letter will arrive.
You have essentially three options:
These options are analogous to the multiple delivery guarantees available in Kafka.
You may choose one or more of these options, depending on the application and use case:
Choosing between at-most-once, at-least-once, and exactly-once delivery will depend on which factors you want to prioritize.
If you care more about the consumer receiving a message and don’t mind duplicate outputs or extra work from the producer, then at-least-once delivery is a good option.
Certain use cases (such as financial transactions) require the guarantee of exactly-once delivery (e.g. to avoid accidentally duplicating a withdrawal or deposit).
At-most-once delivery is the default option for producer/consumer architectures because it requires no additional effort on the part of the developers.
Any messages that are lost due to errors or disruptions are simply disregarded.
This makes at-most-once delivery ideal for use cases such as Internet of Things (IoT) sensors, which are constantly sending new data and measurements.
At-least-once delivery requires the producer to maintain an extra state about message status and to resend failed messages.
This means that at-least-once delivery sacrifices some performance in exchange for the guarantee that all messages will be delivered.
At-least-once delivery is ideal for use cases such as analytics, where all messages must be received but some duplication of messages is acceptable.
With exactly-once delivery enabled, Apache Kafka guarantees that a given message will be delivered once and only once. In a real-time, distributed environment, however, this is no small technical feat.
The problems with achieving exactly-once delivery in Apache Kafka include glitches, network issues, system crashes, and various other errors that disrupt the standard “read-process-write” pattern. This creates mistakes such as:
Kafka has several ways to deal with these potential issues and enable its exactly-once delivery semantics.
With Apache Kafka, it’s easier than ever to work with massive amounts of real-time streaming data—as long as you know what you’re doing.
If you don’t have a fleet of Kafka experts available in-house, it’s a wise choice to join forces with a professional and proven IT service provider who can help you install, deploy, and maintain your Kafka environment.
Adservio with its team of professionals helps companies achieve digital excellence in almost all use cases; from application development, analytics, software delivery, process automation, etc.
If you need assistance with Kafka, let us know your problem and we will take care of providing the solution.
Get in touch with our team of experts and let us know about your business needs and objectives.