Search code examples
gitstoragegit-lfs

Git LFS is HUGE


I have a repo with lots of commits and lots of Blobs. I want to migrate to LFS because it takes significant time to clone and fetch the repo; However, when running the git lfs migrate import command, the repo goes from ~40 GB to well over 200GB. I don't know the actual size because the command has been running for a whole day and is only 40% done. My question is, why is the storage so large when the files in the repo are only around 3GB with no git history? (might be helpful to mention that the git history alone to date is around 17GB)

I tried git lfs migrate import but this is less of a solution and more of a why

UPDATE: I am not looking at how to solve the storage problem locally or remotely. I want to understand why and how LFS is storing objects in a blob store to rack up so much storage space.


Solution

  • When you use Git to store any object, it stores the data deltified which stores most of the objects as references to other objects. It then compresses the objects. This is great if your objects are text files, but with many large files, they are already compressed (images or textures) and therefore the deltification step is slow and ineffective and compression actually expands the data.

    With Git LFS, the large files are stored outside of the repository, and they are neither compressed nor deltified. Again, that's because for a lot of large files, those simply waste CPU and are not effective. Once you push all of those large files to the server, you can run git lfs prune, and at that point, only the large files needed for a checkout will be maintained or downloaded. Thus, if you only need 500 MB of large files in your checkout, that checkout will only download that 500 MB, and the rest will be stored on the server.

    Over time, you can accumulate a decent number of large files on your local system and you may need to run git lfs prune to remove the ones that are no longer needed.