java concurrency apache-kafka kafka-consumer-api kafka-producer-api

How apache kafka handles consistency when one producer and multiple consumers exist in application

Imagine you have an architecture with one producer (P1) and many consumers (C1-C2-C3). When a small java client produces messages as M1, M2, M3 in order and another java clients (3x scaled to another machines) gets a message then writes the message to database table after calculating something.

What if calculation periods are different in consumer applications and the message which is consumed at first may be written to same table in last order, it probably cause data inconsistency.

Maybe I missed something in docs, but I wonder that how can kafka handles consistency in that scenario.

Solution

The consumers don't listen to the producer. Instead:

The producer writes a message to a Kafka topic managed by the Kafka server cluster,
A Kafka server persists that message in one of the partitions created for that topic and
Only then do the consumers get access to the message.

If the consumers are in the same consumer group, then only one of them will be reading from the message's partition and only that consumer will read be able to read that message. If the consumers are not in the same consumer group then they may all be able to read the message. In fact, that message may be read many times by many consumers until the Kafka server deletes the message for being older than the configured time-to-live for the topic.

Once a consumer has read a message from a Kafka topic, Kafka has no control over how, when or even if that message is processed.