Search code examples
memory-managementapache-flink

Input data buffer for multiple downstream operators in parallel in Flink


I want to know how Flink handles branching out the result of one operator to multiple downstream operators. If I have an operator A that is connected to 3 other operators B, C and D in parallel, does Flink keep only one copy of the result and send it out to all 3 operators? Assuming that's the case, is the result removed once it has been sent to all 3 operators or is there a garbage collection process?

I haven't found anything concrete about this topic in the Flink documentation. Any relevant resource regarding this topic is highly appreciated.


Solution

  • Before sending record A to operators B, C, D, FLINK serialise record A into bytes. Then those bytes will be copied into buffer pools for sub-partitions for different operators, so there will be multiple copies of record A.

    Record A will be GC later.