Search code examples
databasecaching

Dealing with stale data in in-memory caches


Suppose the strategy for using an in-memory cache (such as redis/memcache) in front of a database is:

  1. Reading: client will first try to read from the cache. On cache miss, read from the database and put the data in the cache.

  2. Writing: Update the database first, followed by deleting the cache entry.

Suppose the following sequence happens:

  1. client A reads from the cache and got a miss.
  2. client A reads from the database.
  3. client B updates the same entry in the database.
  4. client B deletes the (non-existent) cache entry.
  5. client A puts the (stale) entry in the cache.
  6. client C will then read the stale data in the cache.

Is there any strategy for avoiding such a scenario? I know we could put an expiry time on each cache entry, but still there is a possibility of reading stale data, which could be undesirable in certain situations.


Solution

  • You could version cache data and keep each version immutable. Every time the data in the database changes you increment an integer version column. The cache key must include the version number. Clients can then first look into the database to look up the current version number and then talk to the cache.

    Keeping caches consistent is very hard because they operate non-transactionally. There is no general way to prevent the kind of problem you are talking about. Ideally, you'd like to atomically make writes visible in the DB and in the cache but that is only possible in special cases. Like with the scheme that I propose.