Just reading more details on storm and came across it's ability to do fields grouping so for example if you where counting tweets per user and you had two tasks with a fields grouping of user-id the same user-id's would get sent to the same tasks.
So task 1 could have the following counts in memory bob: 10 alice: 5
task 2 could have the following counts in memory jill:10 joe: 4
If I added a new machine to the cluster to increase capacity and ran rebalance, what happens to my counts in memory? Will you start to get users with different counts?
Using fields grouping we can guide a specific field to go to a particular tasks.
Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.
these task are always static in a storm's life cycle, what you can alter using the rebalance
is number of executors(threads). in case of adding a new node to a cluster allows you to reconfigure the number of executors to run with out shutting down the topology but no matter what the number of tasks remains the same. its just that adding a new node gives you the advantage of increasing the performance by tuning the parallelism of storm.