Search code examples
spring-bootapache-kafkaspring-kafkaspring-cloud-streamspring-cloud-stream-binder-kafka

duplicate consumption of messages with Spring Cloud Stream Kafka binder


We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.

Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).

While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.

Any hints what might be the cause of duplication? What should I be looking for to figure this out?

We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).


Solution

  • You should show your configuration when asking questions like this.

    Best guess is the broker's offsets.retention.minutes.

    With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.