Search code examples
scalaapache-kafkasbtapache-flinkflink-sql

Broadcast "JOIN" in Flink


Is there any way I can use Broadcast Join in FLINK the same way I used in SPARK. I'm working with JOINS but the data is large so I would require Broadcast Join.

Thank You


Solution

  • Flink does not provide a broadcast join like the one in Spark. It's pretty easy to implement one yourself using a BroadcastProcessFunction, but I wonder if it is really appropriate. A broadcast join only makes sense if one of the two streams is fairly small, otherwise a key-partitioned join makes a lot more sense.

    To implement this, broadcast the smaller pattern stream and connect it to the event stream. In the processBroadcastElement method of a BroadcastProcessFunction, store the new pattern, and in the processElement method lookup the relevant pattern and combine it with the event that is being processed.