Search code examples
exceptionapache-kafkakafka-consumer-apispring-kafkamdc

My Kafka consumer retry for 10 times abnormally, and stop its retrying by simply adding a MDC log?


I have a Kafka Listener, I send a single Kafka message to the topic, and my listener receive 10 times (the second one is after the first one complete its handling function, the third one is after the second one finish.. and so on.)

My consumer function be like this:

@KafkaListener(topics = "xxx", groupId = "xxx")
public void onMessageUser(ConsumerRecord<?, ?> record, Acknowledgment ack) {
    //some code
    //xxxx
    ack.acknowledge();
}

My Kafka configuration:

spring.kafka.producer.retries=5
spring.kafka.producer.acks=1
spring.kafka.consumer.max-poll-records=50
spring.kafka.consumer.poll-timeout=5000
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.listener.ack-mode=MANUAL

I add a MDC(Mapped Diagnostic Context) log, and then the consumer become right!! It stop its retrying 10 times. Just receive one time, and process one time.

org.slf4j.MDC.put("requestId", xxx);

Where is the magic????


Update: Great thanks to helpful suggestions from @Artem Bilan and @Gary Russell. I have found the cause. Here is the error code in "some code" fragment:

Map<String,Object> prop = new ConcurrentHashMap<>();   
prop.put(CommonConstants.Request.REQUEST_ID,MDC.get(CommonConstants.Request.REQUEST_ID));

The “MDC.get(CommonConstants.Request.REQUEST_ID)” is null value, so it cause null pointer exception:

java.lang.NullPointerException: null    
at java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)     
at java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)

And since I do not have a try catch block with the issue code, this exception does not print in my error log, so I fail to find that error for a long time. (I have a general Spring Web exception handler in the project, but the Kafka message does not go through my Spring Web exception handler)

The exception cause Kafka's default retry policy for retry additional 9 times.


Solution

  • Most likely, your some code is throwing an exception because the MDC requestId is not present.

    This would cause the default behavior of 9 retries with zero delay between each.

    You don't show your log config, but I assume the pattern requires the MDC to be present.