Search code examples
mapreducebigdataapache-flinkflink-streaming

Flink Broadcast non-KeyedStream to a KeyedStream


I have two streams, stream1 and stream2. Streams are coming from Kafka Topics. Stream1 is a KeyedStream which contains the main data that I want to process in keyed-context. Stream2 is simply a "trigger" stream. What I mean is that Stream1 will print the results it has generated so far only when a trigger arrives from stream2. My problem is that I cant find a way to broadcast the trigger event of stream2 to all partitions of stream1, so they all start printing the results. Any ideas on how to do this? Note: I dont want to use Windows.

Moreover, in flink documentation I see an operator "broadcast" : Datastream -> DataStream but I haven't found any examples to understand how this is used. Can someone explain?

Thank you in advance


Solution

  • See The Broadcast State Pattern page, which has an example of doing something similar to your use case. To summarize...

    • When a stream1 element is received by processElement(), you save it in (keyed) state.
    • When a stream2 (control) element is received by processBroadcastElement, you get use the ctx.applyToKeyedState(StateDescriptor<S, VS> stateDescriptor, KeyedStateFunction<KS, S> function) method to access/emit all of the records you've saved in state for stream1.