Search code examples
gitmemorydiffmime-typesbinaryfiles

Git's memory usage


After creating a repository containing some binary files (yes git indeed doesn't handle binary files that well, but this is a repository where the binaries are mandatory files), performing a commit becomes kind of bloated.

When one performs a commit the memory usage of git reaches 2.7 GiB. Sometimes the process is even killed by the operating system because it uses all remaining system resources.

This is probably due to the internally used diff algorithm that requires to take both the original and the new file into account and needs to take at least one of the files into memory (the second can be handled as a stream).

Is it possible to mark a file as binary and specify that the repository doesn't need to calculate the difference, but only check for a new version (this can be done by handling both files as streams, thus in constant memory). After all, the storing the difference is probably as inefficient as copying the new version.

git repositories are maintained on the machine automatically. It would thus be nice, if the process could be automated and thus use for instance the MIME-type of the files and mark all binary files automatically.


Solution

  • As mentioned in "Exclude a directory from git diff", you can exclude files/folders from diff, with a .gitattributes directive '-diff':

    lib/* -diff
    dist/js/**/*.js -diff
    

    To avoid any out of memory issue due to git diff, you also have since Git v2.2.0 (mid 2014) the configuration core.bigfilethreshold.
    (And the default size for a pack file has been raised).

    Finally, additional features like GVFS (Git Virtual File System, 2017) will improve that kind of issue, and already allows Microsoft to manage the largest Git repository on the planet (the Windows codebase one, approximately 3.5M files about 300GB, with 1,760 daily “lab builds” across 440 branches in addition to thousands of pull request validation builds). This capability is yet to be fully integrated to Git, but illustrates what is possible.