I'm thinking about caching dynamic content in web server. My goal is to bridge the whole processing by returning a cached HTTP response without bothering the DB (or Hibernate). This question is not about choosing between existing caching solutions; my current concern is the invalidation.
I'm sure, a time-based invalidation makes no sense at all: Whenever a user changes anything, they expect to see to the effect immediately rather than in a few seconds or even minutes. And caching for a fraction of a second is useless as there are no repeated requests for the same data in such a short period (since most of the data is user-specific).
For every data change, I get an event and can use it to invalidate everything depending on the changed data. As request happen concurrently, there are two time-related problems:
The two problems are sort of opposite to each other. I guess, the former is easily solved by partially serializing requests from the same client using a ReadWriteLock
per client. So let's forget it.
The latter is more serious as it basically means a lost invalidation and serving the stale data forever (or too long).
I can imagine a solution like repeating the invalidation after every request having started before the change happened, but this sounds rather complicated and time-consuming. I wonder if any existing caches do support this, but I'm mainly interested in how this gets done in general.
The problem is a simple race condition:
To solve the race condition, add a timestamp (or a counter) and check this timestamp when setting a new cache entry. This ensures that obsolete response will not be cached.
Here's a pseudocode:
//set new cache entry if resourceId is not cached
//or if existing entry is stale
function setCache(resourceId, requestTimestamp, responseData) {
if (cache[resourceId]) {
if (cache[resourceId].timestamp > requestTimestamp) {
//existing entry is newer
return;
} else
if (cache[resourceId].timestamp = requestTimestamp) {
//ensure invalidation
responseData = null;
}
}
cache[resourceId] = {
timestamp: requestTimestamp,
response: responseData
};
}
Let's say we got 2 requests for the same resource "foo":
setCache("foo", "00:00:00.001", null)
setCache("foo", "00:00:00.000", ...)
to write the obsolete response to cache but fails because the existing entry is newerThis is just the basic mechanism, so there is room for improvements.