Search code examples
apache-storm

Expired Tuples in Apache Storm Tumbling Window


I have implemented a Tumbling Window (Count based) of size 100. On running the topology, I see that the count of new tuples (inputWindow.get) and the count of expired tuples (inputWindow.getExpired) are both 100. I have set message time out of 600seconds. With this time timeout, I had expected no tuple to expire. What could be the reason for tuples expiring? I have set the bolt as bolt.withTumblingWindow(Count.of(100)) The bolt has parallelism_hint of 120

builder.setBolt("bolt", bolt.withTumblingWindow(Count.of(100)), 120).shuffleGrouping("spout")


Solution

  • I think maybe you're misunderstanding what expired tuples are. Maybe it would have been more friendly to call them "evicted tuples".

    They are tuples that have been evicted from the current window, but were present in the last window. They are not tuples whose message timeouts have expired, though of course they may have also expired in this sense.

    So let's say you receive 200 tuples. You first window will be tuple 0-99, with no expired tuples. Your second window will be tuple 100-199, where tuple 0-99 are expired.

    The reason this is useful is in the case of sliding windows, where the windows are not disjoint. In that case you may get e.g. a window that is 0-99, then 50-149, then 99-199. There it can be helpful if you get told "tuples 0-49 are no longer in the window" rather than having to compute this yourself.

    For more information on this, take a look at the class controlling windows at https://github.com/apache/storm/blob/925422a5b5ad1c3329a2c2b44db460ae94f70806/storm-client/src/jvm/org/apache/storm/windowing/WindowManager.java