Search code examples
apache-storm

Distributed caching in storm


How to store the temporary data in Apache storm?

In storm topology, bolt needs to access the previously processed data.

Eg: if the bolt processes varaiable1 with result as 20 at 10:00 AM.

and again varaiable1 is received as 50 at 10:15 AM then the result should be 30 (50-20)

later if varaiable1 receives 70 then the result should be 20 (70-50) at 10:30.

How to achieve this functionality.


Solution

  • In short, you wanted to do micro-batching calculations with in storm’s running tuples. First you need to define/find key in tuple set. Do field grouping(don't use shuffle grouping) between bolts using that key. This will guarantee related tuples will always send to same task of downstream bolt for same key. Define class level collection List/Map to maintain old values and add new value in same for calculation, don’t worry they are thread safe between different executors instance of same bolt.