I often get Timeout exceptions due to various reasons in my Kafka producer. I am using all the default values for producer config currently.
I have seen following Timeout exceptions:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-1-0: 30001 ms has passed since last append
I have following questions:
What are the general causes of these Timeout exceptions?
what are the general guidelines to handling the Timeout exception?
Are Timeout exceptions retriable exceptions and is it safe to retry them?
I am using Kafka v2.1.0 and Java 11.
Thanks in advance.
The default Kafka config values, both for producers and brokers, are conservative enough that, under general circumstances, you shouldn't run into any timeouts. Those problems typically point to a flaky/lossy network between the producer and the brokers.
The exception you're getting, Failed to update metadata
, usually means one of the brokers is not reachable by the producer, and the effect is that it cannot get the metadata.
For your second question, Kafka will automatically retry to send messages that were not fully ack'ed by the brokers. It's up to you if you want to catch and retry when you get a timeout on the application side, but if you're hitting 1+ min timeouts, retrying is probably not going to make much of a difference. You're going to have to figure out the underlying network/reachability problems with the brokers anyway.
In my experience, usually the network problems are:
nc -z broker-ip 9092
from the server running the producer)