Search code examples
apache-flinkflink-streaming

Flink Broadcast State Pattern Implementation: Performance considerations


I am implementing the Flink Broadcast State in my application and I am considering several implementation option and wonder which is preferable from performance perspective:

  1. When updating keys in the BroadcastState map, would it be beneficial to check if the existing value is identical to the 'new' one? Is it beneficial to try to minimize the number of updated keys?

  2. A BroadcastState with fewer keys, but larger values, or a BroadcastState with many keys, but smaller values -- any preferable option?


Solution

  • When updating keys in the BroadcastState map, would it be beneficial to check if the existing value is identical to the 'new' one? Is it beneficial to try to minimize the number of updated keys?

    It certainly couldn't hurt. Since updates to the Broadcast State are, by definition, broadcast out to all of the other operators that rely on them (which can include parallel instances of the same operator), the more that you can do to reduce these calls, the better.

    A BroadcastState with fewer keys, but larger values, or a BroadcastState with many keys, but smaller values -- any preferable option?

    Since Broadcast State is stored in memory, I don't think it makes too much of a difference (and if so, it seems like it would be a minor optimization). I'd recommend reducing what you store in state to be the most minimal object possible however and try to remove anything that you aren't currently or likely to need (this is especially true of very large state objects).