The Github API specifies two headers that can be used in Conditional Requests, Last-Modified
and ETag
. Which is the more reliable when querying the API?
For context: when using the api endpoint GET /repos/:owner/:repo/git/trees/:sha
on each subdir of a large repo, every response contains the same last-modified
value (even though the repo on github shows different authored dates) while the etag
value for each is different. I'm wondering if the ETag
is a more granular representation of repo content state change (for caching purposes).
Reading "ETags: a pretty sweet feature of HTTP 1.1", it says:
"ETags allow dynamic content to be cached using an app-specific "opaque token""
An ETag, or entity tag, is an opaque token that identifies a version of the component served by a particular URL. The token can be anything enclosed in quotes; often it's an md5 hash of the content, or the content's VCS version number.
If the content of the answer is the same, the ETag should be identical everytime.
I just tested it with https://api.github.com/repos/VonC/gopanic/git/trees/master, and indeed its ETag remains W/"34a03ea1d4dc0b5d533ecf8d36492879"
even when called repeatedly.
But should I get the tree for each subfolder, then the ETag would vary because it represents a signature of the different response content.
The advantage of ETag is that it doesn't depend on a date (whose clock might vary for diverse reason), but on the content of the answer: if unchanged, it
Warning: Brice notes in the comments:
The
etag
value is specific to the server, for example GitHub might use the hash of the blob, but maybe, not always.
Other providers may not even do that, e.g. Apache was using theinode
foretags
in the past.