Search code examples
http-headersetaghttp-caching

Is it a bad idea to construct http ETag headers using database primary keys?


Can I use a database key (from an immutable object) as an ETag?

I am trying to get browser and/or proxy caching to work for my web application (happens to be python/flask, but I don't think that's particularly relevant). In everything I've read about ETags, they are usually discussed as being a hash of the (presumably static) resource.

In my case, I have a class of objects in my database that are not editable. Several different views can be generated for each one of these objects. Generating the views, at least some of them, requires some work on the server, but in general the resulting output is lightweight. So doing the work to generate the whole page, then taking the hash, would be inefficient, and I may as well just send the response at that point.

My thinking is, because each view is built on an immutable database object, that object's key (plus the URL of the request) is enough to know whether the client's cache is good or not. But that would mean using the same ETag for lots of different resources. As far as I can tell it seems like this should work, but

  • Is it against an official spec for how ETags are supposed to be used? E.g. is there always supposed to be a unique ETag for different resources?
  • Is it going to be a bad idea for some other reason?

A bit more context

My application has URLs of the form:

example.com/view/<name>/<version>/<view>/<additional view args>

The DB has a unique index on the combination of <name> and <version>. But there is a special keyword latest for version, which causes the server to find the most recent entry with <name>. No matter what view is requested, it is fully defined by the object found by name and version. So if a client sends a request header with If-none-match: <key>, I would always return 304 regardless of the view requested unless (a) they requested the latest version and (b) the primary key of the latest version in the DB does not match the If-none-match header.


Solution

  • I suggest reading RFC 7232, which is fairly straightforward and will give you an excellent understanding of conditional validation.

    Your desire to avoid computing the response before knowing if there's a match is both sensible and allowable. As the standard makes clear, it's up to you to choose the opaque value. Hashes are just one special case of that. (In fact they require special mention because collisions are theoretically possible.) The standard specifically gives the example of using a version number for the ETag:

    For example, a resource that has implementation-specific versioning applied to all changes might use an internal revision number, perhaps combined with a variance identifier for content negotiation, to accurately differentiate between representations.

    You also ask if the ETag needs to be different for each resource. The answer is no:

    There is no implication of uniqueness across representations of different resources (i.e., the same strong validator might be in use for representations of multiple resources at the same time and does not imply that those representations are equivalent).

    Some would be concerned about exposing database IDs to the client. I don't have a strong feeling about that, but of course that's easily avoided by hashing or otherwise obscuring the ID.

    Looking at your specific design, though, it appears that simply using the version for your ETag would be sufficient. In fact, for all resources other than latest it appears that there's only one possible representation. If so, you should set those entries to be cached forever, and it doesn't really matter what the ETag is. Then for latest use a short cache time and use the version (or primary key, if you prefer) for the ETag.