Because I don't know whether my question will be reopened.. here a more precise question.
I have StreamA (containing a product which is produced within a 30 minutes interval) and StreamB (containing measurements from 4 different sensors, producing a measurement every 5 minutes each). These two streams are joined on a common key. StreamC is the result of this join and contains measurementEnrichedProducts.
I have ~15k products and ~250k measurements. Below are the results:
Run Num records within StreamC 1 149,389 2 149,362 3 149,363 4 149,411
Each run had the exact same config and the events in streamA/B were the same too.
I really do not know why this is the case. Is it possible that there are any problems with the underlying statestores?
I was restarting the application too fast...
When playing with the max.task.idle.ms-property I noticed that the results were stable (same amount every execution) but less than before. After letting the application run for more than 15 minutes (max.task.idle.ms=600000[10minutes]) I received some more results and the number of records in streamC were stable too.
Removing max.task.idle.ms again and waiting long enough lead to the same results.
I suspect the problem occurred due to the out-of-order input data and internal buffers not being filled.