Search code examples
gittagsbranchgit-branch

How git branches and tags are stored in disks?


I recently checked one of my git repositories at work, which had more than 10,000 branches and more than 30000 tags. The total size of the repo, after a fresh clone is 12Gigs. I am sure there is no reason to have 10000 branches. So I believe they would occupy considerable amount of space in the disks. So, my questions are as follows

  1. How branches and tags are stored in disks, like what data-structure used, what information is stored for every branch?
  2. How do I get the metadata about the branches? like when that branch was created, what the size of the branch is.

Solution

  • All git references (branches, tags, notes, stashes, etc) use the same system. These are:

    • the references themselves, and
    • "reflogs"

    Reflogs are stored in .git/logs/refs/ based on the reference-name, with one exception: reflogs for HEAD are stored in .git/logs/HEAD rather than .git/logs/refs/HEAD.

    References come either "loose" or "packed". Packed refs are in .git/packed-refs, which is a flat file of (SHA-1, refname) pairs for simple refs, plus extra information for annotated tags. "Loose" refs are in .git/refs/name. These files contain either a raw SHA-1 (probably the most common), or the literal string ref: followed by the name of another reference for symbolic refs (usually only for HEAD but you can make others). Symbolic refs are not packed (or at least, I can't seem to make that happen :-) ).

    Packing tags and "idle" branch heads (those that are not being updated actively) saves space and time. You can use git pack-refs to do this. However, git gc invokes git pack-refs for you, so generally you don't need to do this yourself.