git bitbucket large-files git-push git-clone

git push is taking forever on large codebase despite very small commit -- why, and is that fixable?

(Sorry if this is answered before -- I couldn't find this specific problem, everyone else is having it for unrelated reasons. Feel free to just point me to the one I missed.)

I'm working with a repo that is very large because of commits of numerous large files. I recently shallow cloned it (depth=1), then later pulled a large change on master. Then I branched, made a small commit, and pushed to that new branch. It's taking forever to push it up, even though it's only adding a small commit (15 kb, one file affected) on top of commit that the remote (Bitbucket) already knows about. The message I see looks like this:

silasbarta$ git push origin HEAD
Counting objects: 21034, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (11817/11817), done.
Writing objects:   4% (967/21034), 406.50 MiB | 1.36 MiB/s

Running git gc didn't change this.

Why would it need to deal with all those objects? It should just need to know my commit diff and the reference to the previous commit hash, which is already on the origin, right?

Solution

There are some cases when using a shallow clone where the sending Git doesn't realize that the receiving Git has most of the files already. This is because the way one Git understands what another Git has is by hash IDs. Shallow clones mess with hash IDs: commit a123456 says it has b789abc as a parent, and then there's a special record that says ... but we don't have b789abc. Git then pretends that a123456 isn't a123456 after all.

Some cases could be handled better, but aren't. Using a slightly deeper clone (--depth 2) might have avoided this, but nothing is really guaranteed here. Git's code trades a bit of imperfection in determining what it can omit, in order to make the usual cases faster.