Description:
The beauty of process functions is the fact they give us the ability to access keyed state and timers which empowers the developer complete control each event received in the input stream. They work great and address pretty much all of my use cases. So why am I here bothering you? Fair question. I'm curious if there are any underlying Flink performance optimizations that come from using a static implementation of a processor vs using the new key word?
Examples:
Applying Static Process Function to a Keyed Stream
private static final CountWithTimeoutFunction TIMEOUT_COUNT_PROCESSOR = new CountWithTimeoutFunction();
// apply the process function onto a keyed stream
DataStream<Tuple2<String, Long>> result = stream
.keyBy(0)
.process(TIMEOUT_COUNT_PROCESSOR);
Applying Non Static Process Function to a Keyed Stream
// apply the process function onto a keyed stream
DataStream<Tuple2<String, Long>> result = stream
.keyBy(0)
.process(new CountWithTimeoutFunction());
No, there's no optimization. In both cases your workflow graph is first built and then serialized/distributed to Task Managers, before it gets deserialized and execution starts. So there's no win from using a singleton for the function, as only one of these gets created in either case, when building the workflow graph.