Spark RDD.persist(..) can help avoid the duplicated RDD evaluation.
Is there the same feature in Flink?
Actually I would like to know if I code like the following, Flink will evaluate dataStream
once or twice?
val dataStream = env.addSource(...).filter(...).flatMap(...)
val s1 = dataStream.keyBy(key1).timeWindow(...).aggregate(..)
val s2 = dataStream.keyBy(key2).timeWindow(...).reduce(...)
There is no need for persist
in Flink as a DataStream
on which multiple operators are applied is evaluated once and replicates all outgoing messages to each downstream operator.
The program in your case is executed as
/-hash-> keyBy(key1) -> ...
Source -> Filter -> FlatMap ->-<
\-hash-> keyBy(key2) -> ...