Search code examples
gitdata-recoverygit-fsckgit-plumbing

How can I recover HEAD^'S tree?


tl;dr: is it possible to recover HEAD^'s tree if it is deleted and not pushed beforehand and if everything else is intact?

I accidentally deleted part of my .git. I'm not entirely sure what's missing.

Upon discovering that git push didn't work, I ran a git fsck:

Checking object directories: 100% (256/256), done.
Checking objects: 100% (1265/1265), done.
broken link from  commit f3419f630546ba02baf43f4ca760b02c0f4a0e6d
              to    tree 29616dfefd2bff59b7fb3177e99b4a1efc7132fa
broken link from  commit ccfe9502e24d2b5195008005d83155197a2dca25
              to    tree 0580c3675560cbfd3f989878a9524e35f53f08e9
broken link from  commit ccfe9502e24d2b5195008005d83155197a2dca25
              to  commit 0bca9b3a9f1dd9106922f5b4ec59cdc00dd6c049
broken link from    tree 6d33d35870281340c7c2f86c6d48c8f133b836bb
              to    blob 226d8a10a623acd943bb8eddd080a5929f3ccb2c
broken link from  commit db238d4a52ee8f18a04c038809bc6587d7643438
              to    tree 0b69ab3f6940a04684ee8c0c423ae7da89de749c
missing tree 0580c3675560cbfd3f989878a9524e35f53f08e9
dangling commit 05512f9ac09d932e7d9a11d490c8a2f117c0ca11
missing tree 29616dfefd2bff59b7fb3177e99b4a1efc7132fa
dangling commit 578464dde7d7b8628f77e536b4076cfa491d7602
missing blob 5d351b568abb734605ca4bf446e13cfd87ca9ce8
missing tree 0b69ab3f6940a04684ee8c0c423ae7da89de749c
missing commit 0bca9b3a9f1dd9106922f5b4ec59cdc00dd6c049
dangling blob d53a9d0f3364b648edbc4beede022e4594a84c35
missing blob 23db34f729a88c5f5f7fe6e281921f1334f493d1
dangling commit 8dcbde55462ca0c29e0ca339a49db95b43188ef1
dangling blob e59b25b9675625d0e6b8abfa37e955ab46493fd9
missing blob 226d8a10a623acd943bb8eddd080a5929f3ccb2c
dangling commit 85fdaaa579cf1ae2a8874e3e1f3c65d68b478179
dangling commit 075e9d72e90cc8bf3d960edd8376aaae0847f916
missing blob 83fec2ff8cfcaaa06c96917b6973ace96301e932
dangling commit a88e18e1c102d909361738fd70137b3f4a1c7496
dangling blob 9c6f61e0acffe2a1f5322cd2b72c181e95e9de75
dangling commit ca9fe0dd3123a731fc310b2a2285b00ef673de79

So my assumption is that I'm merely missing some information that can be recovered from GitHub. My knee-jerk reaction was to run git fetch, but that returns with no output, because it thinks there's nothing new to fetch.

I tried unpacking .git/objects/pack/pack-ea43d1db155e4502c2250ec1d4608843715c8b1f.pack, several ways, but it never worked. For example:

% git clone --mirror git://github.com/strugee/dots.git # returns bare repo
Cloning into bare repository 'dots.git'...
remote: Counting objects: 1331, done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 1331 (delta 12), reused 0 (delta 0)
Receiving objects: 100% (1331/1331), 402.31 KiB | 197.00 KiB/s, done.
Resolving deltas: 100% (454/454), done.
Checking connectivity... done.
% ls dots.git
config  description  HEAD  hooks  info  objects  packed-refs  refs
% mkdir git-tmp; cd git-tmp
% git init
% git unpack-objects < ../dots.git/objects/pack/pack-ea43d1db155e4502c2250ec1d4608843715c8b1f.pack
error: inflate: data stream error (incorrect data check)
error: inflate returned -3

I got this error every time. (Keep in mind: it's a --mirror, so it's an exact copy of what GitHub has - right? How could it be corrupt then?)

Eventually I realized that I didn't actually need to unpack the packfile. I could just copy it back into the original repo, and Git would pick it up just fine. So:

% cd ../configs
% cp ../dots.git/objects/pack/pack-ea43d1db155e4502c2250ec1d4608843715c8b1f.* .git/objects/pack/

And that seemed to do the trick. Mostly.

% git fsck
Checking object directories: 100% (256/256), done.
Checking objects: 100% (2596/2596), done.
broken link from  commit db238d4a52ee8f18a04c038809bc6587d7643438
              to    tree 0b69ab3f6940a04684ee8c0c423ae7da89de749c
dangling commit 05512f9ac09d932e7d9a11d490c8a2f117c0ca11
dangling commit 578464dde7d7b8628f77e536b4076cfa491d7602
missing blob 5d351b568abb734605ca4bf446e13cfd87ca9ce8
missing tree 0b69ab3f6940a04684ee8c0c423ae7da89de749c
dangling blob d53a9d0f3364b648edbc4beede022e4594a84c35
dangling commit 8dcbde55462ca0c29e0ca339a49db95b43188ef1
dangling commit 85fdaaa579cf1ae2a8874e3e1f3c65d68b478179
dangling commit 075e9d72e90cc8bf3d960edd8376aaae0847f916
missing blob 83fec2ff8cfcaaa06c96917b6973ace96301e932
dangling commit a88e18e1c102d909361738fd70137b3f4a1c7496
dangling commit ca9fe0dd3123a731fc310b2a2285b00ef673de79

As you can see, that repaired all but one missing link. As it turns out, db238d is the id of a commit (which happens to be HEAD^) that I had not yet pushed. Am I correct in assuming that the last two commits in this repository are unrecoverable, and I will need to recreate the contents of those commits? Did I make the right decisions in this scenario?


Solution

  • Try git fetch-pack to recover missing objects available from another repository. Instructions below.

    For recovery of unpushed commits, specifically HEAD^1 I would start with

    git diff-tree -r HEAD~2^{tree} HEAD^{tree}
    

    You'll get a list of all trees/blobs that have changed and their SHAs (which would include the changes from both HEAD and HEAD^1). Depending on how much information is available you may be able to recreate some of all of the missing tree. Missing blobs are more problematic though.

    Use of git fetch-pack

    Intentionally corrupt repository

    me@myvm:/scratch/corrupt/.git  (GIT_DIR!)$ cd objects/
    me@myvm:/scratch/corrupt/.git/objects  (GIT_DIR!)$ ll
    total 20
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 20
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 22
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 25
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 info
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 pack
    me@myvm:/scratch/corrupt/.git/objects  (GIT_DIR!)$ rm -rf 22
    

    Verify head in bad state

    me@myvm:/scratch/corrupt/.git/objects  (GIT_DIR!)$ cd ../../
    me@myvm:/scratch/corrupt  (master)$ git status
    fatal: bad object HEAD
    

    recover missing objects

    me@myvm:/scratch/corrupt  (master)$ git fetch-pack --all $(git config --get remote.origin.url)
    error: refs/heads/master does not point to a valid object!
    error: refs/remotes/origin/HEAD does not point to a valid object!
    error: refs/remotes/origin/master does not point to a valid object!
    error: refs/heads/master does not point to a valid object!
    error: refs/remotes/origin/HEAD does not point to a valid object!
    error: refs/remotes/origin/master does not point to a valid object!
    remote: Counting objects: 3, done.
    remote: Total 3 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    22ecde746be79c65b27a5cf1dc421764d8ff6e17 HEAD
    22ecde746be79c65b27a5cf1dc421764d8ff6e17 refs/heads/master
    me@myvm:/scratch/corrupt  (master)$ git status
    On branch master
    Your branch is up-to-date with 'origin/master'.
    nothing to commit, working directory clean
    

    missing objects restored

    me@myvm:/scratch/corrupt  (master)$ ll .git/objects/
    total 20
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 20
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:05 22
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 25
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 info
    drwxrwxr-x 2 andrewc warp 4096 Oct  7 06:03 pack
    me@myvm:/scratch/corrupt  (master)$ 
    
    
    me@myvm:/scratch/corrupt  (master)$ git status
    On branch master
    Your branch is up-to-date with 'origin/master'.
    nothing to commit, working directory clean
    

    If you end up in a state where you can find a broken tree object and a broken blob object you can manually recover those. You can git cat-file -p BLOB_SHA for any blob, this will dump the contents. If you can figure out by looking at the contents what the file is that can help you recover the file. Likewise git cat-file -p TREE_SHA will dump the tree, which tells you file names and blob SHAs. At this point you would be attempting to manually construct tree and commit objects from presumably partial data. If your HEAD commit is OK then you are only missing history and should at least have the most recent state covered.