Search code examples
apache-flinkflink-streaming

How to merge two DataStreams in Apache Flink


I'm using Flink to process my streaming data.

I have two data sources: A and B.

// A
DataStream<String> dataA = env.addSource(sourceA);
// B
DataStream<String> dataB = env.addSource(sourceB);

I use map to process the data coming from A and B.

DataStream<String> res = mergeDataAAndDataB();   // how to merge dataA and dataB?

Saying that sourceA is sending: "aaa", "bbb", "ccc"..., sourceB is sending: "A", "B", "C"....

What I'm trying to do is to merge them as Aaaa, Bbbb, Cccc... to generate a new DataStream<String> object.

How to achieve this?


Solution

  • There are two kinds of stream merging in Flink.

    dataA.union(dataB)
    

    will create one new stream that has the elements of both streams, blended in some arbitrary way, perhaps "aaa", "bbb", "A", "ccc", "B", "C", which isn't what you've asked for -- just mentioning it for completeness.

    What you do want is to create a connected stream, via

    dataA.connect(dataB)
    

    which you can then process with a RichCoFlatMapFunction or a KeyedCoProcessFunction to compute a sort of join that glues the strings together.

    You'll find a tutorial on the topic of connected streams in the Flink documentation, and an example that's reasonably close in the training exercises that accompany the tutorials.