Search code examples
gitgit-branchgit-resetgit-gc

After a git reset, unreachable commit not removed


I have a small repo that has a couple of commits:

* a0fc4f8 (HEAD -> testbranch) added file.txt  
* e6e6a8b (master) hello world now  
* f308f53 Made it echo  
* f705657 Added hello  
* 08a2de3 (tag: initial) initial  

Also:

$ git status  
On branch testbranch  
nothing to commit, working directory clean  

I can not understand the following behavior. On this state I run: $ git reset initial
I see now:

* e6e6a8b (master) hello world now  
* f308f53 Made it echo  
* f705657 Added hello  
* 08a2de3 (HEAD -> testbranch, tag: initial) initial  

What I was expecting: Commit a0fc4f8 would be deleted since it is unreachable.
What happened:
1) Doing git show a0fc4f8 still shows the commit
2) Doing git status shows the file.txt that was added by commit a0fc4f8 as untracked and file hello that was added by commit f705657 also shows up as untracked.
3) Running git gc or git gc --prune=all does not delete a0fc4f8 although it is not reachable anymore and has no name/tag associated with it.
Why are these happening?

Update:

$ git fsck  
Checking object directories: 100% (256/256), done.  
Checking objects: 100% (15/15), done.    

Update 2:

$ git log --all --decorate --graph --oneline  
* e6e6a8b (master) hello world now  
* f308f53 Made it echo  
* f705657 Added hello  
* 08a2de3 (HEAD -> testbranch, tag: initial) initial  

$ git gc --force  
Counting objects: 15, done.  
Delta compression using up to 4 threads.  
Compressing objects: 100% (8/8), done.  
Writing objects: 100% (15/15), done.   
Total 15 (delta 1), reused 15 (delta 1)   

$ git log --all --decorate --graph --oneline  
* e6e6a8b (master) hello world now  
* f308f53 Made it echo  
* f705657 Added hello  
* 08a2de3 (HEAD -> testbranch, tag: initial) initial  

$ git show a0fc4f8 Still shows the commit

Update 3:

$ git reflog testbranch  
08a2de3 testbranch@{0}: reset: moving to initial  
a0fc4f8 testbranch@{1}: commit: added file.txt  
e6e6a8b testbranch@{2}: branch: Created from HEAD  

Solution

  • 1) Doing git show a0fc4f8 still shows the commit

    This is by design. The unreachable objects are not removed immediately for several reasons:

    • maybe you ran the last command by mistake (or provided wrong arguments to it), you realize the error and want to go back to the previous state;
    • the gain of removing an unreachable object (saving some amount of disk space) is too small compared to the amount of work required to complete the action.

    Pruning the unreachable object is performed automatically from time to time. It is also performed by some git commands (fetch and push are some of them).

    2) Doing git status shows the file.txt that was added by commit a0fc4f8 as untracked and file hello that was added by commit f705657 also shows up as untracked.

    You ran git reset without specifying a mode. The default mode is --mixed and that means:

    • the branch is moved to the commit specified in the command (initial in this case);
    • the index is reset to match the new commit pointed by the branch;
    • the working tree is not modified.

    This explains why the files are in the directory (the third bullet) and why they are untracked (the second bullet; the index matches the initial commit but these files didn't even exist when it was created).

    3) Running git gc or git gc --prune=all does not delete a0fc4f8 although it is not reachable anymore and has no name/tag associated with it.

    git gc also checks the branch reflogs for references. If your testbranch branch has the reflog enabled then the most recent entry in the reflog points to commit a0fc4f8 (this is where the testbranch branch was before you ran git reset). You can check if the reflog is enabled for branch testbranch by running git reflog testbranch. If it prints something you'll find the commit a0fc4f8 on the second line, at position testbranch@{1}. The notation name@{n} means the prior nth value of branch name (the commit it was pointing to, n moves in the past).

    You can find more about the way git gc works in the documentation.

    In the Notes section it reads:

    git gc tries very hard to be safe about the garbage it collects. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, refs saved by git filter-branch in refs/original/, or reflogs (which may reference commits in branches that were later amended or rewound).

    If you are expecting some objects to be collected and they aren’t, check all of those locations and decide whether it makes sense in your case to remove those references.