Search code examples
gitsizecommitgit-squash

Which factors contribute to size of git repo


I would like to know which factors contribute to the size of a git repo, Except the data of course.

Does having a long history means a big repo? Does having many branches have some affect on it?

Also how do you guys handle your commits? I read that each commit should have at least one logical unit of change added to it. I know that commits can be squashed by rebasing before pushing. (Never rebase published of course).

So i don't know if i should squash them or not. Because i don't know if it makes any change to the size or not.

Thanks


Solution

  • A repo itself will see its size vary mainly because of the nature of the data put in it: binary data will be less efficiently stored than non-binary data, and is generally bigger anyway.

    A repo in use (locally clone) can see its size vary depending on the last gc and repack: See git gc --aggressive vs git repack. Packfiles are where deltification is done.

    As for the commits, read "Utter Disregard for Git Commit History".

    These are two extremes of viewing what the core unit of change is for the respective project.

    • From Git’s perspective — likely because of the ease of use inside a mailing list approach — a single atomic commit makes most sense.
    • From GitHub’s perspective, individual commits become less valuable because the atomic unit is the pull request

    In both cases, more historical context on a change can be easily found by going back to the mailing list discussion or the pull request conversation.