Search code examples
github-apietaglast-modified

Which is more reliable for Github API Conditional Requests, ETag or Last-Modified?


The Github API specifies two headers that can be used in Conditional Requests, Last-Modified and ETag. Which is the more reliable when querying the API?

For context: when using the api endpoint GET /repos/:owner/:repo/git/trees/:sha on each subdir of a large repo, every response contains the same last-modified value (even though the repo on github shows different authored dates) while the etag value for each is different. I'm wondering if the ETag is a more granular representation of repo content state change (for caching purposes).


Solution

  • Reading "ETags: a pretty sweet feature of HTTP 1.1", it says:

    "ETags allow dynamic content to be cached using an app-specific "opaque token""

    An ETag, or entity tag, is an opaque token that identifies a version of the component served by a particular URL. The token can be anything enclosed in quotes; often it's an md5 hash of the content, or the content's VCS version number.

    If the content of the answer is the same, the ETag should be identical everytime.

    I just tested it with https://api.github.com/repos/VonC/gopanic/git/trees/master, and indeed its ETag remains W/"34a03ea1d4dc0b5d533ecf8d36492879" even when called repeatedly.

    But should I get the tree for each subfolder, then the ETag would vary because it represents a signature of the different response content.

    The advantage of ETag is that it doesn't depend on a date (whose clock might vary for diverse reason), but on the content of the answer: if unchanged, it


    Warning: Brice notes in the comments:

    The etag value is specific to the server, for example GitHub might use the hash of the blob, but maybe, not always.
    Other providers may not even do that, e.g. Apache was using the inode for etags in the past.