Search code examples
javamultithreadingcachinggoogle-guava-cache

how to deal with dirty data in Guava Cache


I use Guava Cache to cache my data. The data in the cache will be cleaned if it has not been used for several minutes.

If I modify my data, I will update the data in cache, and mark the data "dirty"(because it is be modified ,and is different with the data in database). And Every 5 minutes I will push the "dirty" data to database(i.e., update the data in database).

The problem is, there is a "dirty" data A. Before data A being pushed to database, the data A has been cleaned first, then I will lose the "dirty" data A.

So, I add a RemovalListener to the Guava Cache when the data has been cleaned, the RemovalListener will notice me and I will a callback function. In the function, I attempt to put the data back to the cache. But in multithreaded environment, it can not promise the data correct.

e.g:

1)cache: clean Data A

2)Thread 1: get Data A, the Data A in cache has been cleaned, so cache will get the Data A from database.And the Data A in database is not newest. So the Thread 1 get a incorrect Data A.

3)cache: run RemovalListener callback.

So, how can I deal with the dirty data, so that I can promise the Data is always correct when in multithread? Thanks!


Solution

  • A possible solution is to write the dirty data in the RemovalListener. If this is done synchronous other operations on the same entry are blocked, and no inconsistent state becomes visible. Depending on the latency of your database this might effect other operations on the cache as well, see the warning in Guavas documentation.

    Generally speaking, what you like to do is a so called "write behind cache". There are cache products that have this functionality build in. Take a look at existing solutions.