Search code examples
gitgithubgithub-enterprise

Impact on storage when forking a repository


Scenario: Master repository with 100+ developers working off it

Is there a significant impact on Github storage space in the scenario where 100+ developers are forking a parent repo or is it a valid strategy for each developer to have their own fork of the repo, and then make PRs to the parent repo?

I looked through several other threads that may have some relevance to this question, but was only able to find that forks share objects to minimize storage usage. However, I was not able to figure out the level of impact on a large scale (hundreds of forks) and if this will significantly take up available storage.


Solution

  • A fork, on GitHub, would not duplicate (on theGitHub server side) the full repository, as explained in "Counting Objects" by Vicent Martí in 2015.

    Very early on we figured out that actually forking people’s repositories was not sustainable.

    For instance, there are almost 11,000 forks of Rails hosted on GitHub: if each one of them were its own copy of the repository, that would imply an incredible amount of redundant disk space, requiring several times more fileservers than the ones we have in our infrastructure.

    That’s why we decided to use a feature of Git called alternates.

    When you fork a repository on GitHub, we create a shallow copy of it.
    This copy has no objects of its own, but it has access to all the objects of an alternate, a root repository we call network.git and which contains the objects for all the forks in the network.