Search code examples
gitgit-rebasegit-stashgit-fsckgit-dangling

git fsck combining --lost-found and --unreachable


I found many interesting posts about git fsck, so I wanted to experiment a little on them. First of all the sources I read before this question:

I started with this repo:

* 9c7d1ea (HEAD -> test) f
* cd28884 e
| * 7b7bac0 (master) d
| * cab074f c
|/  
* d35af2c b
| * f907f39 r # unreferenced commit
|/
* 81d6675 a

Where r has been created from a detached HEAD from a. Then I wanted to rebase master on test, but I had some unstaged changes, so I did:

git rebase --autostash test

Obtaining (I am not showing r but it is still there):

* caee68c (HEAD -> master) d
* 2e1cb7d c
* 9c7d1ea (test) f
* cd28884 e
* d35af2c b
* 81d6675 a

Next I run:

$ git fsck
#...
dangling commit 6387b70fe14f1ecb90e650faba5270128694613d # stash
#...
$ git fsck --unreachable
#...
unreachable commit 6387b70fe14f1ecb90e650faba5270128694613d # stash
unreachable commit d8bb677ce0f6602f4ccad46123ee50f2bf6b5819 # stash index
#...
$ git fsck --lost-found
#...
dangling commit 6387b70fe14f1ecb90e650faba5270128694613d # stash
dangling commit f907f39d41763accf6d64f4c736642c0120d5ae2 # r
#...

First question

Why does only the --lost-found version return the r commit? And why are not the c and d before the rebase shown among the unreachables? I thought I understood the difference reading the linked questions, but I am clearly missing something. I still have the complete reflog, but I guess you do not need it, since all commits (except those related to the stash) are referenced.


I know I should create another post but the second question is partially related. I tried out of curiosity:

$ git fsck --lost-found --unreachable
#...
unreachable commit 6387b70fe14f1ecb90e650faba5270128694613d # stash
unreachable commit d8bb677ce0f6602f4ccad46123ee50f2bf6b5819 # stash index
unreachable commit f907f39d41763accf6d64f4c736642c0120d5ae2 # r
unreachable commit 7b7bac0608936a0bcc29267f68091de3466de1cf # c before rebase
unreachable commit cab074f2c9d63919c3fa59a2dd63ec874b0f0891 # d before rebase
#...

Second question

Combining both options I get all the unreachable commits (and not just the union of --lost-found and --unreachable), this is very unexpected. Why does it behave like this?


Solution

  • Some of this is indeed puzzling, and appears not to be properly documented, but a quick look at builtin/fsck.c shows that using --lost-found:

    1. turns on --full;
    2. turns on --no-reflogs.

    Item 1 isn't particularly interesting since --full is now on by default anyway, but the documentation really should call out that --lost-found disables --no-full. Item 2 explains most of the rest; I have a guess at the last part [Edit: the rest].

    Note that when you ran:

    git checkout master && git rebase --autostash test
    

    this made Git run git stash push, which made a new stash consisting of two new commits. Git then did the rebase as usual, which copied the cab074f and 7b7bac0 commits, visible in the original git log --all --decorate --oneline --graph output, to the new 2e1cb7d and caee68c commits visible in the second output.

    Why does only the --lost-found version return the r commit? And why are not the c and d before the rebase shown among the unreachables?

    Presumably that commit is still in the HEAD reflog. That makes it reachable from a reference—but since --lost-found implies --no-reflogs, it becomes unreachable this time. The same goes for the originals of c and d: they're reachable via multiple reflog entries, from both HEAD's reflog and master's.

    Combining both options I get all the unreachable commits (and not just the union of --lost-found and --unreachable), this is very unexpected. Why does it behave like this?

    That's more puzzling. [Edit: solved; see below.] Let's run these in order of your git fsck commands:

    • fsck 1 and fsck 2: Both discover the autostash commits. That's because git stash push copied the original refs/stash to the stash reflog, so that refs/stash could point to the autostash w (working tree) commit. Then the implied git stash apply && git stash drop (git stash pop) applied the stash and dropped it, moving the stash@{1} entry back to refs/stash and deleting the stash reflog. So the w commit from the autostash is truly "dangling". It's not in refs/stash and it's not even in the stash reflog, because git stash (ab)uses this reflog as the "stash stack". It does, however, point to the i commit from the autostash.

      The first fsck, then, prints 6387b70fe14f1ecb90e650faba5270128694613d and calls it "dangling". That's the w commit that was dropped. The second fsck, with --unreachable, adds d8bb677ce0f6602f4ccad46123ee50f2bf6b5819: the corresponding i commit that was dropped.

    • fsck 3: The r and rebased commits remained invisible under git fsck --unreachable because they're referenced from the reflogs. But now, with --lost-found, fsck does not look at the reflogs. We should expect to see the autostash w commit, the r commit, and the pre-rebase d, all as dangling. [Edit: as per comment, this is wrong: w links back to i and d, so this will hide d.]

      We actually see the w and r commits but not the d commit. Why not? This is my guess but it's easy to test if you still have the setup around: when you use git rebase successfully, Git creates or updates the pseudo-ref named ORIG_HEAD to remember the hash ID of the tip commit before the rebase completes. Note that this same name is used to remember the previous value of a ref after a successful git reset that moves one, and after any other operation that might move a branch name some distance (fast-forward merge, for instance).

      It's pretty obvious that git fsck must consider all of the various *_HEAD pseudo-refs as starting points for reachability. This, too, is not documented (and it's not even completely clear it's intentional here—the ref code has been under some fairly heavy rework lately, to support alternative ref backends).

    • fsck 4, just before your SECOND QUESTION section: either --unreachable turned off the pseudoref inclusion, or—I think this is more likely—you did something in between that touched ORIG_HEAD so that it no longer selected the original, pre-rebase d commit. [edit] Since --unreachable lists all unreachable commits, the fact that d is reachable indirectly from the autostash w commit is irrelevant, and we see everything.

    If you would like to report a Git documentation bug, that the fsck documentation does not note that --lost-found implies --no-reflogs, you should do that.