sorting apache-kafka apache-flink spring-kafka apache-kafka-streams

Sorting messages from two kafka topics into another topic

Let's say I have 2 Kafka topics, each with one partition.

I want to use a technology like i.e. Kafka Streams, Apache Flink or Spring Kafka to write data into a third topic by joining these two topics based on Kafka timestamp chronology. However, I'm concerned about the issue that might arise during the first run of such a program. For example, if there are 20 million messages in topic A (3 weeks old) and only 1000 messages in topic B (1 day old).

Any suggestions?

Solution

With Flink you can use watermark alignment to avoid having to do a lot of buffering of the source with more data.

Kafka Streams will merge the two streams, reading from whichever stream has the lower timestamp -- but on a best-effort basis.

Math.Sin() gives incorrect value
How to run my python script when the sunOS is start booting
Express-session: not resetting cookie expiration on each request
Getting a stack overflow exception when normalizing a vector
Edit default summary function in R gives error for multiple variables
What was a For loop? Why isn't it needed in R?
How to use download button in shiny and save results in various formats (csv, texte, pdf, spss...)?
Why are there two assignment operators, `<-` and `->` in R?
lm()$assign: what is it?
How to get the value of list(...) in R and S functions
Design matrix for MLM from library(lme4) with fixed and random effects
how to generate elements not included in my sample
Create a matrix with gradually changing values without a for loop
Emacs ESS and S-plus ( S+ ) 8.1 compatability
How to lag date-index in a time-series in R?
Nonlinear regression in R / S
Calling R from S-Plus?