Search code examples
apache-storm

Is there a particular reason that the pendingEmits queue is limited to 1024 elements


I am working with Storm v1.2.1 on a cluster of 5 r4.xlarge EC2 nodes. Currently, I am crunching a network dataset that involves queries time-based sliding windows. After numerous trial-and-error cycles for figuring out a good-enough configuration for my use-case, I came across the Executor class, which maintains a member named pendingEmits of type MpscChunkedArrayQueue<AddressedTuple> (line 119 in storm-client module, class: org.apache.storm.executor.Executor). This queue has a hard-coded upper-bound of 1024 elements.

Every time I tried a configuration with my dataset, I would receive an IllegalStateException when Storm would attempt to add an acknowledgement tuple to pendingEmits with full capacity. In order to avoid getting the exception, I increased the hard-coded size of pendingEmits to 16534. This seems to be working (for now).

Why is pendingEmits's maximum size set to 1024? Is it because of performance, or was it a random decision?

I am skeptical about this decision, because if a window consists of more than 1024 tuples (in my case each window is about 2700 tuples), the queue will become full, and the IllegalStateException will be thrown.

By increasing pendingEmits maximum size, do I jeopardize other aspects (components) of Storm?

Thank you!


Solution

  • I'm not sure why 1024 exactly was picked (likely for performance as you mention), but if you pull the latest version of Storm, it should be fixed https://github.com/apache/storm/pull/2676.