Search code examples
apicachinggithubgithub-api

Caching Github API calls


I have a general question related to caching of API calls, in this instance calls to the Github API.

Let's say I have a page in my app that shows the filenames of a repo, and the content of the README. This means that I will have to do a few API calls in order to retrieve that.

Now, let's say I want to add something like memcached in between, so I'm not doing these calls over and over, if I don't need to.

How would you normally go about this? If I don't enable a webhook on Github, I have no way of knowing whether the cache should expire. I could always make a single call to get the current sha of HEAD, and if it hadn't changed, use cache instead. But that's on a repo-level, and not on a file level.

I can imagine I could do something like that with the object-sha's, but if I need to call the API anyway to get those, it defeats the purpose of caching.

How would you go about it? I know a service like prose.io has no caching right now, but if it should, what would the approach be?

Thanks


Solution

  • Would just using HTTP caching be good enough for your use case? The purpose of HTTP caching is not just to provide a way of not making requests if you already have a fresh response, rather - it also enables you to quickly validate if the response you already have in cache is valid (without the server sending the complete response again if it is fresh).

    Looking at GitHub API responses, I can see that GitHub is correctly setting the relevant HTTP headers (ETag, Last-modified, Cache-control).

    So, you just do a GET, e.g. for:

    GET https://api.github.com/users/izuzak/repos
    

    and this returns:

    200 OK
    ...
    ETag:"df739f00c5053d12ef3c625ad6b0fd08"
    Last-Modified:Thu, 14 Feb 2013 22:31:14 GMT
    ...
    

    Next time - you do a GET for the same resource, but also supply the relevant HTTP caching headers so that it is actually a conditional GET:

    GET https://api.github.com/users/izuzak/repos
    ...
    If-Modified-Since:Thu, 14 Feb 2013 22:31:14 GMT
    If-None-Match:"df739f00c5053d12ef3c625ad6b0fd08"
    ...
    

    And lo and behold - the server returns a 304 Not modified response and your HTTP client will pull the response from its cache:

    304 Not Modified
    

    So, GitHub API does HTTP caching right and you should use it. Granted, you have to use an HTTP client that supports HTTP caching also. The best thing is that if you get a 304 Not modified response - GitHub does not decrease your remaining API calls quota. See: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests

    GitHub API also sets the Cache-Control: private, max-age=60 header, so you have 60 seconds of freshness -- which means that requests for the same resource made less than 60 seconds apart will not even be made to the server.

    Your reasoning about using a single conditional GET request to a resource that surely changes if anything in the repo changed (a resource showing the sha of HEAD, for example) sounds reasonable -- since if that resource hasn't changed, then you don't have to check the individual files since they haven't surely changed.