Search code examples
apache-storm

Storm: when to use setNumTasks?


I'm curious about the circumstances that would necessitate the use of the setNumTasks function. The docs say that the default is one task for each executor.

If I have an 'expensive' db task(calls to external dbs that take time) to run in a bolt with 'fast' tasks on either side would it behoove me to add extra Tasks for this?

Or is this one of those 'try it and see what happens' sort of scenarios?


Solution

    • the number of tasks is always >= number of executors
      • the number of executors can be changed (without killing the topology), but the constraint num tasks >= num executors must be respected. This is, if you have more tasks than executors you can re-balance your topology and give it more executors.

    how to decide how many executors/tasks do you need?

    • look for bottle necks, the one you pointed is a good one, the latency to access an external data source (look at the bolt process latency on storm UI). In this case you can (probably should) have more execution units on this bolt; And if you have "spare" tasks you can promote them to executors.
    • Another bottle necks is the CPU usage (look at the bolt capacity on storm UI), bolts which are more CPU intensive will require more execution units.

    I recommend you read this page