Search code examples
javaapache-kafkaspring-kafka

Can the @KafkaListener be configured to only process the latest entry?


According to https://docs.confluent.io/kafka/design/log_compaction.html#compaction-guarantees

Any consumer that stays caught-up to the head of the log will see every message that is written; these messages will have sequential offsets.

The topic is set to have cleanup policy of [compact,delete]

In the Spring @KafkaListener I was wondering if I send a large number say 10000 entries all random value but the same key within a second and the KafkaListener has a Thread.sleep for 5 seconds just to give it a delay would it process all 10000 in the listener or be smart enough to drop message with the same key so it only takes the most recent one.


Solution

  • The listeners will process all 10k messages. Depending how the producer partitioner is configured, those messages might get spread across all partitions, sent to only one partition or maybe something else. By default Kafka will kind of hash the key and use that to figure out to what partition to send the messages, so all messages will end up in a partition and then the consumer will process the 10k messages. If you are new to kafka, I strongly suggest you to read the book Kafka the definitive Guide. Kafka is a complex technology and it's very easy to misconfigure (and lose data).

    The topics have a few extra settings that play together with the clean up policy. In particular the messages won't be deleted until retention.ms has elapsed (7 days by default). There are other configuration properties that can be used such as retention.bytes, segment.bytes and many other to configure when Kafka will compact or delete data.