Search code examples
gitrenamegit-push

Why is folder rename, git push to new remote branch slow


I just

  • created a new branch
  • renamed nearly all files in my repository by renaming a top folder
  • push branch as new branch upstream

I get

Writing objects:  26% (3337/12428), 270.49 MiB | 779.00 KiB/s 

which takes a long time.

Out of interest, why do these objects need to be written? I had expected git to just send a "rename" command upstream.


Solution

  • It should not be (slow), at all, at least not unless you're doing tricky things with shallow repositories, or using dumb protocols. There is, however, no "rename" operation sent upstream.

    Internally, Git stores everything as one of four Git object types: commits, trees, blobs (files), and (annotated) tags.

    A commit object is generally very small. Here's a sample of a real actual commit object, out of the source for Git itself:

    $ git cat-file -p HEAD | sed 's/@/ /'
    tree 6fe777d97b5a6fb3176d47c5ccda454deb69a8f6
    parent cc00d9cfffbbeb34ee23731668656b2ebc165c85
    author Junio C Hamano <gitster pobox.com> 1461960207 -0700
    committer Junio C Hamano <gitster pobox.com> 1461964869 -0700
    
    Eighth batch for 2.9
    
    Signed-off-by: Junio C Hamano <gitster pobox.com>
    

    When you rename a directory or file, what you get, in the end, is a new "tree" object. Here's a bit of the top level tree for that same commit:

    $ git cat-file -p 'HEAD^{tree}'
    100644 blob 5e98806c6cc246acef5f539ae191710a0c06ad3f    .gitattributes
    100644 blob 05cb58a3d4ef47295fa8ef02add44a0f0dd90d1f    .gitignore
    100644 blob e5b4126bec557db55924b7b60ed70349626ea2c4    .mailmap
    100644 blob 78e433ba718df00d112a5f57d523afb8db189c79    .travis.yml
    100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42    COPYING
    040000 tree 1771d89504a0003add17bffd2170f39490bad1ff    Documentation
    

    If I were to rename COPYING or Documentation, I would get a new tree object (with a different ID) but the existing blob objects for .gitattributes, .gitignore, and so on would all be unchanged. This is true for sub-trees and blobs within Documentation/ as well. Depending on which particular directory you renamed, one could expect Git to need one or more new "tree" objects to go with your (one) new "commit" object. None of these objects should be very large.

    A subsequent git push, over any reasonably smart protocol, should:

    • discover that it need only send one new commit object and however many tree objects
    • compress those objects against objects that your Git knows (because of shared hash values) exist on the remote Git (before "writing objects" there's an exchange phase where their Git tells you what they have)
    • write a "thin pack" that sends just two or five or however-many objects, which should take a few kilobytes and milliseconds

    and then be done with the transmission phase. (The remote must then "fix" the thin pack, which could take some time, and verify that the push is allowed and—if allowed—update the remote repository, before sending an acknowledgement or failure response.)