I have a DataStream from Kafka which has 2 possible value for a field in MyModel. MyModel is a pojo with domain-specific fields parsed from a message from Kafka.
DataStream<MyModel> stream = env.addSource(myKafkaConsumer);
I want to apply window and operators on each key a1, a2 separately. What is a good way to separate them? I have 2 options filter and select in mind but don't know which one is faster.
Filter approach
stream
.filter(<MyModel.a == a1>)
.keyBy()
.window()
.apply()
.addSink()
stream
.filter(<MyModel.a == a2>)
.keyBy()
.window()
.apply()
.addSink()
Split and select approach
SplitStream<MyModel> split = stream.split(…)
split
.select(<MyModel.a == a1>)
…
.addSink()
split
.select<MyModel.a == a2>()
…
.addSink()
If split and select are better, how to implement them if I want to split based on the value of a field in MyModel?
Both methods behave pretty much the same. Internally, the split()
operator forks the stream and applies filters as well.
There is a third option, Side Outputs . Side outputs might have some benefits, such as different output data types. Moreover, the filter condition is just evaluated once for side outputs.