I am working with Storm v1.2.1 on a cluster of 5 r4.xlarge
EC2 nodes. Currently, I am crunching a network dataset that involves queries time-based sliding windows. After numerous trial-and-error cycles for figuring out a good-enough configuration for my use-case, I came across the Executor
class, which maintains a member named pendingEmits
of type MpscChunkedArrayQueue<AddressedTuple>
(line 119 in storm-client
module, class: org.apache.storm.executor.Executor
). This queue has a hard-coded upper-bound of 1024 elements.
Every time I tried a configuration with my dataset, I would receive an IllegalStateException
when Storm would attempt to add an acknowledgement tuple to pendingEmits
with full capacity. In order to avoid getting the exception, I increased the hard-coded size of pendingEmits
to 16534. This seems to be working (for now).
Why is pendingEmits
's maximum size set to 1024? Is it because of performance, or was it a random decision?
I am skeptical about this decision, because if a window consists of more than 1024 tuples (in my case each window is about 2700 tuples), the queue will become full, and the IllegalStateException
will be thrown.
By increasing pendingEmits
maximum size, do I jeopardize other aspects (components) of Storm?
Thank you!
I'm not sure why 1024 exactly was picked (likely for performance as you mention), but if you pull the latest version of Storm, it should be fixed https://github.com/apache/storm/pull/2676.