Search code examples
javastreamapache-flinkflink-streaming

How to connect two streams and operate them in Flink?


I have a DataStream<Tuple2<String, Double>> one and DataStream<Tuple2<String, Double>> second, where the first one has much more elements from another and they have different keys. Moreover, Datastream "two" has basically one key-value pair. So, I want to connect these streams in order to divide the values of the first datastream with the constant value of second datastream. How can this be done in Apache Flink? Does this be done with connected datastreams or is an another way?


Solution

  • In the described case the best idea is to simply use the broadcast state pattern. The second stream with few elements would become a broadcast stream and the first one with more elements would be then enriched with elements of the second one. So, You would have something like:

    //define broadcast state here
    
    firstStream.keyBy([someKey])
    .connect(secondStream.broadcast([mapStateDescriptor])
    .process([YourProcessFunction])
    

    And then in Your process function for the process element You could do the enrichment to produce the expected tuple.

    More on the broadcast pattern can be found here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html