In streaming system, the order of data is big problem.
we know that in flink, handle the data out of order, is use window and watermark.
But, in the inner flink, between operators, how to guaranteed the order of data?
Can flink ensure that advanced data can be processed first?
or in operators, the order of data is out of order
In Flink there are no guarantees about data order being preserved (or at least not once you have parallelism > 1). E.g. you have a stream with a map()
operator with parallelism == 2, and then you do a groupBy()
followed by some other operation. On one server the map sub-task is processing data very fast, and on the other it's very slow. The order in which data is received by each partition after the groupBy obviously won't match the original order of the data.
If you require strict ordering, then you'll have to buffer/sort yourself in an operator, and deal with the same late data issues that a windowing operator encounters (ie how long do you want before deciding that you couldn't possibly get a record that should sort before the last record in the buffer).