Search code examples
apache-flink

Does anyone have a good example of a ProcessFunction that sums or aggregates data at some frequency


I am looking mimic the behaviour of a window().reduce() operation but without a key at the task manager level. Sort of like a .windowAll().reduce() does for a stream, but I am looking to get individual results from each task manager.

I tried searching for "flink processFunction examples" but not finding anything useful to look at.


Solution

  • For ProcessFunction examples, I suggest the examples in the Flink docs and in the Flink training materials.

    Another approach would be to use windows with a random key selector. That's not as easy as it sounds: you can't just select by a random number, as the value of the key must be deterministic for each stream element. So you could add a field that you set to a random value, and then keyBy that field. Compared to the ProcessFunction approach this will force a shuffle, but be simpler.