Search code examples
apache-flinkflink-streaming

Flink: the result of SocketWindowCount example isn't what I expected


I am new to flink. I follow the quickstart on the flink website and deploy flink on a single machine. After I do "./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000" and enter the words as the website described, I get the result:final result

It seems that the program didn't do the reduce,I just want to know why? Thanks.


Solution

  • The program did do a reduce, but not fully, because your input must have fallen into two different 5 second windows. That's why the 4 instances of ipsum where reported as 1 + 3 -- the first one fell into one window, and the other 3 into another window (along with the "bye").

    Flink's window boundaries are based on alignment with the clock. So if your input events occurred between 14:00:04 and 14:00:08, for example, they would fall into two 5 second windows -- one for 14:00:00 - 14:00:04.999 and another for 14:00:05 - 14:00:09.999 -- even though all of your events fit into a single interval that's only 4 seconds long.

    If you try again, you can expect to see similar, but probably different results. That's a consequence of doing windowed analytics based on "processing time". If you want your applications to get repeatable results, you should plan on using "event time" analytics instead (where the timestamps are based on when the events occurred, rather than when they were processed).