Search code examples
apache-storm

Can I increase parallelism of running Storm topology dynamically


Storm Version: 1.2.1

As par the link here, following is syntax to rebalance the storm topology:

storm rebalance topology-name [-w wait-time-secs] [-n new-num-workers] [-e component=parallelism]*

I have a simple topology where I have some bolt BoltB running with paralleilism 5 and numTasks = 1, I ran following command:

storm rebalance myTopo -n 5(same as earlier) -w 20 -e BoltB=10

It deactivated the topology, but activated it with same number of parallelism for BoltB, Am I missing something, is this supposed to work in this way, Will I have to have numTasks for BoltB higher to achieve this?


Solution

  • Please read http://storm.apache.org/releases/2.0.0-SNAPSHOT/Understanding-the-parallelism-of-a-Storm-topology.html.

    The short of it is that the number of tasks for a component in a Storm topology is static once you've submitted the topology. So if you do setNumTasks(1) for boltB in your topology setup, then there will only ever be 1 instance of boltB, which means there will only be 1 thread running boltB at a time.

    You can think of tasks as the cap on how many threads you can spread the boltB work across without redeploying. Storm creates an instance of your bolt for each task, and then spreads them across however many threads you've told it to use via the parallelism_hint parameter during setup.

    The parallelism_hint sets the initial number of executors (threads) for the bolt. The number of executors can be changed without redeploying the topology, via the rebalance command, but you can't raise the number of executors higher than the number of tasks.