Search code examples
apache-flinkflink-streaming

How can I maintain the ordering of Stream of events (Realtime) while utilizing the Broadcast State Pattern in Apache Flink with multiple parallelism?


Seeking advice on upholding event order in a realtime event stream while employing the Broadcast State Pattern in Apache Flink with multiple parallel instances. The parallel nature of Flink's processing presents challenges in ensuring correct event sequencing. Looking for insights, best practices, or code examples to handle event order within this context.


Solution

  • My typical approach is to broadcast from an operator with a parallelism of 1 in order to not have to worry about this.

    Given the limitations on broadcast state (it has to be kept on the heap, and every parallel instance will checkpoint its own copy), it's wise to keep the size of broadcast state reasonably small anyway, so restricting the broadcast source to a single instance has never felt very limiting.

    If you really must broadcast from multiple parallel sources, there's no way to guarantee anything; there's a race between the different source instances.