I'm learning Flink with simple toy examples.
I have adapted the WindowWordCount
example from here and run it on this simple data file
cat data.txt
a a b c c
I ignored the slideSize
and only do countWindow(windowSize)
(hence it's called tumbling window according to this post for multiple window sizes. I'm confused by the results.
When windowSize = 1
, the output is
(a,1)
(a,1)
(b,1)
(c,1)
(c,1)
which makes total sense. But when windowSize = 2
, the output is
(a,2)
(c,2)
where is count for b
?
when windowSize = 3
, the output is empty, which I don't understand either.
Can anyone help me understand how the outputs are produced in the cases of windowSize = 2
and windowSize = 3
, please?
My code is on https://gist.github.com/zyxue/bc566180d2b01f1e2e77c1bbe3a7c5e5
And I run it with command like
./bin/flink run /path/to/target/myflinkapp-1.0-SNAPSHOT.jar --input data.txt --output /tmp/out --window 1
The Trigger
for CountWindow
only triggers the window function for complete windows -- in other words, after processing windowSize
events for a given key, then the window will fire.
For example, with windowSize = 2
, only for a
and c
are there enough events. Since there is only one b
, the job ends with a partially filled window for b
.
You can use a custom trigger that also reacts to a timeout if you want to generate reports for partial count windows.