Search code examples
javaconcurrencyapache-storm

Share a Concurrent Hashmap Between Multiple Apache Storm Bolt Tasks


I have a storm bolt which writes keys and values to a hashmap from certain tuples and reads values from the hashmap using keys stored in others. It works fine when the number of tasks is set to one but as soon as I increase this number the keys begin to return null values when they shouldnt. I assume this is because each bolt task is creating its own instance of the hashmap so data isnt shared. How do I get all bolt tasks to share a single Hashmap?

I am currently creating the hashmap in the prepare method like so:

protected Map<String, JsonObject> hashMap;

@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector)
{
    _collector = collector;
    hashMap = Collections.synchronizedMap(new ConcurrentHashMap<String, JsonObject>());

}

I have also tried defining the hashmap at the topology level and giving it to my bolt as a variable but this did not work.


Solution

  • Task can be deployed to different workers, (ie, different machines) and thus, you cannot share anything in-memory between tasks.

    Tasks as "independent units of work" an must solely depend on their input.