I have 4 sources of different event types, each event has a String field indicating a Key which can be used to join these events, each key is unique in its own stream. My goal is to produce a DataStream, where Ev is composed of data from each of these 4 event types. I can create KeyedStreams from the sources quite easily, but then I want a giant connect()
where I immediately join all the streams so I have all the event objects in one process function/map etc. for a given window. If I connect the streams pair by pair, I'll need intermediary representations of the partial joins
How can I achieve this, or is it only possible to join these streams in sequence?
For what it's worth, for use cases like this, the Table API is much easier to work with. With Tables this will just work, but with DataStreams you're going to have to work harder.
Your options are to either join them in sequence, or map all 4 streams onto some unified type, union those transformed streams into one stream, and then write a KeyedProcessFunction that pieces together the join result from the incoming events.
You could instead convert the DataStreams to Tables, join them, and then convert the resulting Table back to a DataStream (if necessary). The conversion overhead is minor. The one issue to keep in mind, however, is that the state used by the join won't be something you can easily evolve. If you need to change the join in the future (e.g., to include another field in the result), you'll probably have to throw away the state and start over.