Search code examples
apache-flinkflink-streaming

How to implement a LEFT OUTER JOIN with streams in Apache Flink


I have two streams left and right. For the same time window let's say that

  • the left stream contains the elements L1, L2 (the number is the key)
  • the right stream contains the elements R1, R3

I wonder how to implement a LEFT OUTER JOIN in Apache Flink so that the result obtained when processing this window is the following:

(L1, R1), (L2, null)

L1, R1 are matching by key (1), and L2, R3 do not match. L2 is included because is at left


Solution

  • Well, You should be able to obtain the proper results with the coGroup operator and properly implemented CoGroupFunction. The function gives You access to the whole group in the coGroup method. The documentation states that for CoGroupFunction one of the groups may be empty, so this should allow You to implement the Outer Join. The only issue is the fact that groups are currently created in memory, so You need to verify that Your groups won't grow too big as they can effectively kill the JVM.