Search code examples
javaapache-kafka-streams

KStream-KStream-Join with different Results on consecutive executions


Because I don't know whether my question will be reopened.. here a more precise question.

I have StreamA (containing a product which is produced within a 30 minutes interval) and StreamB (containing measurements from 4 different sensors, producing a measurement every 5 minutes each). These two streams are joined on a common key. StreamC is the result of this join and contains measurementEnrichedProducts.

I have ~15k products and ~250k measurements. Below are the results:


Run   Num records within StreamC
1     149,389
2     149,362
3     149,363
4     149,411

Each run had the exact same config and the events in streamA/B were the same too.

I really do not know why this is the case. Is it possible that there are any problems with the underlying statestores?


Solution

  • I was restarting the application too fast...

    When playing with the max.task.idle.ms-property I noticed that the results were stable (same amount every execution) but less than before. After letting the application run for more than 15 minutes (max.task.idle.ms=600000[10minutes]) I received some more results and the number of records in streamC were stable too.

    Removing max.task.idle.ms again and waiting long enough lead to the same results.

    I suspect the problem occurred due to the out-of-order input data and internal buffers not being filled.