Search code examples
apache-stormapache-storm-topology

How Apache Storm parallelism works?


I am new to Apache storm and wondering how parallelism hint works.

For e.g. We have one stream containing two tuples <4>,<6>, one spout with only one task per executor and we have one bolt to perform some operation on the tuples and having parallelism hint as 2, so we have two executor of this bolt namely A and B, regarding this, I have 3 questions.

  1. Considering above scenario is this possible that our tuple which contain value 4 is processed by A and another tuple which contain value 6 is processed by B.
  2. If processing done in this manner i.e. mentioned in question (1), then won't it impact on operation in which sequence matter.
  3. If processing not done in this manner, means both tuples going to same executor then what is the benefit of parallelism.

Solution

    1. Considering above scenario is this possible that our tuple which contain value 4 is processed by A and another tuple which contain value 6 is processed by B.

    Yes.

    1. If processing done in this manner i.e. mentioned in question (1), then won't it impact on operation in which sequence matter.

    It depends. You most likely have control over the sequence of the tuples in your spout. If sequence matters, it is advisable to either reduce parallelism or use fields grouping, to make sure tuples which depend on each other go to the same executor. If sequence does not matter use shuffleGrouping or localOrShuffleGrouping to get benefits from parallel processing.

    1. If processing not done in this manner, means both tuples going to same executor then what is the benefit of parallelism.

    If both tuples go to the same executor, there is no benefit, obviously.