I have an Apache Kafka 2.6 Producer which writes to topic-A (TA). I also have a Kafka streams application which consumes from TA and writes to topic-B (TB). In the streams application, I have a custom timestamp extractor which extracts the timestamp from the message payload.
For one of my failure handling test cases, I shutdown the Kafka cluster while my applications are running.
When the producer application tries to write messages to TA, it cannot because the cluster is down and hence (I assume) buffers the messages. Let's say it receives 4 messages m1,m2,m3,m4 in increasing time order. (i.e. m1 is first and m4 is last).
When I bring the Kafka cluster back online, the producer sends the buffered messages to the topic, but they are not in order. I receive for example, m2 then m3 then m1 and then m4.
Why is that ? Is it because the buffering in the producer is multi-threaded with each producing to the topic at the same time ?
I assumed that the custom timestamp extractor would help in ordering messages when consuming them. But they do not. Or maybe my understanding of the timestamp extractor is wrong.
I got one solution from SO here, to just stream all events from tA to another intermediate topic (say tA') which will use the TimeStamp extractor to another topic. But I am not sure if this will cause the events to get reordered based on the extracted timestamp.
My code for the Producer is as shown below (I am using Spring Cloud for creating the Producer): Producer.java
@Service
public class Producer {
private String topicName = "input-topic";
private ApplicationProperties appProps;
@Autowired
private KafkaTemplate<String, MyEvent> kafkaTemplate;
public Producer() {
super();
}
@Autowired
public void setAppProps(ApplicationProperties appProps) {
this.appProps = appProps;
this.topicName = appProps.getInput().getTopicName();
}
public void sendMessage(String key, MyEvent ce) {
ListenableFuture<SendResult<String,MyEvent>> future = this.kafkaTemplate.send(this.topicName, key, ce);
}
}
Why is that ? Is it because the buffering in the producer is multi-threaded with each producing to the topic at the same time ?
By default, the producer allow for up to 5 parallel in-flight requests to a broker, and thus if some requests fail and are retried the request order might change.
To avoid this re-ordering issue, you can either set max.in.flight.requests.per.connection = 1
(what may have a performance hit) or set enable.idempotence = true
.
Btw: you did not say if your topic has a single partition or multiple partitions, and if your messages have a key? If your topic has more then one partition and you messages are sent to different partitions, there is no ordering guarantee on read anyway, because offset ordering is only guaranteed within a partition.
I assumed that the custom timestamp extractor would help in ordering messages when consuming them. But they do not. Or maybe my understanding of the timestamp extractor is wrong.
The timestamp extractor only extracts a timestamp. Kafka Streams does not re-order any messages, but processes messages always in offset-order.
If not, then what are the specific uses of the timestamp extractor ? Just to associate a timestamp with an event ?
Correct.
I got one solution from SO here, to just stream all events from tA to another intermediate topic (say tA') which will use the TimeStamp extractor to another topic. But I am not sure if this will cause the events to get reordered based on the extracted timestamp.
No, it won't do any reordering. The other SO question is just about to change the timestamp, but if you read messages in order a,b,c the result would be written in order a,b,c (just with different timestamps, but offset order should be preserved).
This talk explains some more details: https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/