I would like to run a Flink Streaming application that works on the read-once-write-many analogy. Basically, I would like to read from a firehose, apply different filters in parallel for reach record read from source, and send them to different sinks based on configuration.
How is this possible to do in Flink? I think in KafkaStreams there is a concept where you can do this. Below is an illustration of how I want my Flink DAG to look like:
The most simple way to accomplish this would be through the use of the filter()
transformation as demonstrated below:
DataStream<X> stream = ...;
// Filter 1
stream
.filter { x -> x.property == 1 }
.sinkTo(sink1)
// Filter 2
stream
.filter { x -> x.property == 2 }
.sinkTo(sink2)
// Repeat ad nauseum
Alternatively, you could use consider using Side Outputs such that you'd only require a single "filter" function which could handle separating each of your filtered streams into separate outputs that you could then act upon.
If you absolutely require n different streams, you might have to look at the type of sink that you intend to write to and consider handling the filtering at that level