Search code examples
gitgithubcurlgithub-apidu

Is there a way to get repo size of a mirror clone using github api?


For getting the size using github api, I'm able to do a curl and extract size from it as below.

curl -H "Authorization: token <token>" https://api.github.com/repos/<org>/<repo> | jq | grep size

But, when I do mirror clone of that repo and do du -sh . in the repo directory, I see different value.

What am I missing here?

Is there a way to get the mirror clone size using github api instead of cloning it to local and using du on it?


Solution

  • No, there isn't. GitHub doesn't garbage-collect objects by default, so the size of the repository it has on disk may contain many objects which are not used. As a result, GitHub can't know until it serves a request what data it has on disk will be used to satisfy that request and how it will be deltified and compressed over the connection (and hence, in the resulting packfile in the clone).

    In addition, because GitHub may store multiple packs at once, it may store multiple copies of the same objects to ensure the packs are complete. Thus, what's on disk may be larger than what the clone has. A future repack may end up making the repository smaller (or possibly larger) than it is now.

    The API will provide an approximation of how big the repo is on disk at GitHub, but it isn't a guarantee of what size you'll get in any given situation.