Can a message loss occur in Kafka even if producer gets acknowledgement for it?

Kafka doc says:

Kafka relies heavily on the filesystem for storing and caching messages.

A modern operating system provides read-ahead and write-behind techniques that prefetch data in large block multiples and group smaller logical writes into large physical writes.

Modern operating systems have become increasingly aggressive in their use of main memory for disk caching. A modern OS will happily divert all free memory to disk caching with little performance penalty when the memory is reclaimed. All disk reads and writes will go through this unified cache

...rather than maintain as much as possible in-memory and flush it all out to the filesystem in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.”

Further this article says:

(3) a message is ‘committed’ when all in sync replicas have applied it to their log, and (4) any committed message will not be lost, as long as at least one in sync replica is alive.

So even if I configure producer with acks=all (which causes producer to receive acknowledgement after all brokers commit the message) and producer receives acknowledgement for certain message, does that mean their is still a possibility that the message can get lost, especially if all brokers goes down and the OS never flushes the committed message cache to disk?

Solution

With acks=all and if the replication factor of the topic is > 1, it's still possible to lose acknowledged messages but pretty unlikely.

For example, if you have 3 replicas (and all are in-sync), with acks=all, you would need to lose all 3 brokers at the same time before any of them had the time to do the actual write to disk. With acks=all, the aknowledgement is sent once all in-sync replicas have received the message, you can ensure this number stays high with min.insync.replicas=2 for example.

You can reduce the possibility of this scenario even further if you use the rack awareness feature (and obviously brokers are physically in different racks or even better data centers).

To summarize, using all these options, you can reduce the likeliness of losing data enough so that it's unlikely to ever happen.