Search code examples
apache-flinkflink-streaming

Flink Broadcast State Pattern: Failure recovery


According to the documentation "No RocksDB state backend" to the broadcast state.

Does this mean that with every failure (of a task level, or of an entire JVM), the new restarted task will see an "empty" state (while all the other running tasks will continue having full state)?

If Flink takes care of rebuilding the broadcast state of the failed task, should we take into account the possibility of receiving new non-broadcast event while the broadcast state is not yet fully built?


Solution

  • Does this mean that with every failure (of a task level, or of an entire JVM), the new restarted task will see an "empty" state (while all the other running tasks will continue having full state)?

    No, broadcast state is still stored in state and will be available via mechanisms such as checkpoints and savepoints, it just will always be restored in-memory as opposed to other forms of state that can be backed by RocksDB on disk. As such, you typically will only want to use broadcast state in situations where you can comfortably store the entirety of the state in memory.

    If Flink takes care of rebuilding the broadcast state of the failed task, should we take into account the possibility of receiving new non-broadcast event while the broadcast state is not yet fully built?

    As far as failures go, when your specific job/task fails, it will generally restart the job and restore the state from the previous checkpoint, which will contain your items that were stored in broadcast state. This restoration should occur as part of the initialization of the operator, so you shouldn't begin receiving elements until after it is "ready" and has been initialized.