Search code examples
gitgarbage-collectiongit-pushgit-reflog

Push to origin after reflog expire and gc prune


I removed some unreachable and dangling commits in my local repo using

git fsck --unreachable --dangling --no-reflogs
git reflog expire --expire=now --all
git gc --prune=now

But I find the removed commits still available on the origin (GitHub, to be precise).

I tried git push --force but it doesn't synchronize the changes to origin. How do I force sync the changes to origin (have the unreachable/dangling commits removed from remote as well)?

This is a similar question with no answer:

Scope of gc prune and git reflog expire, and related config


Solution

  • Short form

    You can't dictate how the remote stores its data from the client.

    Longer form

    First, I think the place to start is to understand that your local repository is not the same as the remote one. git fsck and git gc operate on the local repository--which you already knew, since you're asking the question.

    Second, Git works by transferring objects. The trick here is that it only talks about reachable objects over the wire. Meaning, there must be a path from the a reference (a branch or a tag) to the object in the history somehow. If the object being referred to is not reachable, Git will refuse to transfer it to the client, even if it's in the object database. The flip side of this is that anything that you do locally that doesn't involve modifying or updating a reference, can't be communicated between the local and remote repositories. You can't say "sync my local object database layout to the remote". You can only say "make the reachable objects between my local and remote the same."

    Finally, how things get represented in GitHub, and whether or not objects get pruned eventually, is entirely up to GitHub. Zach Holman has given a talk on some of the things happening behind the scenes. I imagine they run something in the background to prune dangling objects at some point, but from a remote access standpoint, it really doesn't matter--people cannot access the unreferenced objects. The only issue left is size. I know they're doing some sort of pruning because I've trimmed repositories in the past and decreased their size (you can check this by looking at the size member using the api call. You can try this as an example: https://api.github.com/repos/jszakmeister/vimfiles).

    If your goal is to shrink the repository size because you checked in objects that are too large, take a look at the Removing sensitive data page from GitHub's help section. It applies equally to large files that you want to permanently remove (simply removing them via a commit doesn't remove them entirely from history).

    If the goal is to reduce repository size via compacting and removing dangling objects, GitHub is already doing there own thing, and you don't really have much control over how that's done. They go to great lengths to keep it small, fast, and efficient though.