Search code examples
sortingapache-kafkaapache-flinkspring-kafkaapache-kafka-streams

Sorting messages from two kafka topics into another topic


Let's say I have 2 Kafka topics, each with one partition.

I want to use a technology like i.e. Kafka Streams, Apache Flink or Spring Kafka to write data into a third topic by joining these two topics based on Kafka timestamp chronology. However, I'm concerned about the issue that might arise during the first run of such a program. For example, if there are 20 million messages in topic A (3 weeks old) and only 1000 messages in topic B (1 day old).

Any suggestions?


Solution

  • With Flink you can use watermark alignment to avoid having to do a lot of buffering of the source with more data.

    Kafka Streams will merge the two streams, reading from whichever stream has the lower timestamp -- but on a best-effort basis.