Search code examples
apache-flinkflink-streaming

flink when to use timewindowAll


I have a pipeline that consumes data with the following shape : case class Foo(source: String, destination: String){def key=source+destination} I want to remove all source+destination duplicates that arrive in the same hour and then I want to count all calls that arrives for a destination in the same hour. I created a pipeline with a src ~> timewindow1(1 hour, keyBy:key) ~> timewindow2(1 hour, keyBy:destination) ~> ... should I use timewindowAll in timewindow2 ? enter image description here


Solution

  • You should only use timeWindowAll in cases where you don't want to have key-partitioned windowing. Since you are keying by destination, you should use timeWindow, not timeWindowAll.