I am new to Kafka, and I am trying to understand how a Kafka Consumer can consume and process a message from a Kafka topic without losing messages if the consumer fails in between.
For Example:
There is a kafka topic called cart_checkout
which just holds an event when the user successfully checks out their cart. All it will do is get an event and send an email to the user that the their items are checked out.
This is the happy path:
What happens if the app fails during Step 3
?
If the consumer starts up then it will miss an event, am I right ? (Since the read message was committed)
The consumer can rewind but how will it know to rewind ?
RabbitMQ solution:
It seems easier to solve in RabbitMQ since we can control the ACKs to a message. But in Kafka, if we commit the read-offset we lose the message when the app restarts, if we don't commit the read-offset then the same message is sent to multiple consumers.
What is the best possible way to deal with this problem ?
As you mentioned a consumer offset can also be managed manually which you will need in this case to avoid duplications(by default the delivery guarantee is at least once) or missing an email as mentioned in your use case.
To manually control the offset auto-commit should be disabled (enable.auto.commit = false).
Now regarding the second part of your question where you mention:
if we don't commit the read-offset then the same message is sent to multiple consumers
This understanding is not fully correct. In Kafka by default each consumer controls its own offset and consumers in the same consumer group don't share partitions(each consumer in the same consumer group reads from a single partition) so the message will not be processed (in Kafka consumers poll messages) by other consumers in the same consumer group, that would happen only to consumers in a different consumer group but then it would be the intention that they also read the message anyway, that's by design in Kafka clients. It's also relevant that you understand that the consumer poll returns multiple messages and the default commitSync or commitAsync will commit the offset of all messages returned by that poll call if you want to avoid possible email duplication you might want to use a more specific commit, check the API here: https://kafka.apache.org/30/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
As you're learning this is a common misunderstanding and there are some nice resources to read that clarify this concept for free around, I suggest the official design section of the documentation: https://kafka.apache.org/documentation/#theconsumer also this free book chapter: https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.html is specific to what you're doing now. Good luck.