Search code examples
gitgit-commitgit-refloggit-gc

Determine what prevents a commit from being pruned from git


How can I determine what is preventing a commit from being pruned from git by the following commands?

git reflog expire --expire=now --all

git gc --prune=now

Details

I want to completely remove a commit (with, e.g., commit hash XYZ) from my clone. If the above is not the correct command to do so (or if any of my following commands / deductions are incorrect), please let me know.

I know that XYZ remains in my clone after running the above prune because the following returns a log listing:

git log XYZ

I know that XYZ isn't in any branch because the following outputs nothing:

git branch --contains XYZ

I thought that XYZ wasn't in any stash because the following outputs nothing:

git stash list

XYZ, however, actually was in a stash, but a git bug prevented the stash from being listed.


Solution

  • If there are no stashes and you've expired the reflogs, it seems reasonable to assume that the commit is reachable from some ref - but not all refs are branches.

    You could try this:

    git for-each-ref --format='%(refname)' |xargs -I {} git rev-list {} --format="%H {}" |grep ^<hash>
    

    where <hash> is the ID of the commit you're looking to get rid of. In a simple test I ran

    git for-each-ref --format='%(refname)' |xargs -I {} git rev-list {} --format="%H {}" |grep ^80c0ab
    

    and got output like

    80c0ab39850d7b3ef4969ab934d834f22959a317 refs/original/refs/heads/master
    

    telling me that my target commit was kept alive by a ref under refs/original - in this case a pre-rewrite "backup ref" created by git filter-branch


    Update - Some follow-up based on comments.

    You note that the above command returns refs/stash, yet the stash list (per git stash list) is empty.

    The thing is, the stash list uses a heavily manipulated reflog. And the command you used to clear reflogs

    git reflog expire --expire=now --all
    

    will have destroyed the key reflog. So now the stash commands don't know what to do and act like there are no stashes, but the stash ref does still exist, keeping anything from the most recent stash (or the full commit history reachable from the commit on which that stash was created) alive locally[1].

    IMO that could be considered a bug. Scheduled reflog expiry by default leaves stash alone (for... well... this reason). Perhaps the argument goes that you specifically said to expire all reflogs, but I would argue that "all reflogs except the stash" would be a more useful definition of --all in this instance.

    Well, whatever.

    If you're sure you don't care about whatever was stashed

    git update-ref -d refs/stash
    

    and then resume your clean-up.


    [1] "alive locally" because, at least by default, stash isn't shared. It is likely that cloning the repo, or pushing its refs into an empty remote, would not carry the offending commit along. However, this is dependent on the assumption that git will send a minimal pack - and AFAIK it isn't guaranteed to do that. So if you need the commit gone, then the safest thing is to reach a point where it doesn't exist locally, and then rebuild any remotes (etc) from that clean local repo.