Search code examples
gitgit-branchgit-checkout

fatal: unable to read tree error on git checkout


Unfortunately my hard drive got damaged, and now I'm trying to restore my project files, which are in a repository on another branch.

When git checkout currency-convertor outputs this:

fatal: unable to read tree 41d648e8fd1281a8cdb1fcadb10daac5f7be2d39

How can I restore the files of this branch? After going through different sites, I only realized that somehow I can restore them, through backups, etc., and I sort of found how to do it, maybe I did something wrong, but the methods I found did not help me. What surprised me was that of all the branches, only the branch was affected currency-convertor, although at the time of the disk failure, another branch was installed. Is it possible to recover at least some files?

Maybe it's important, the output of the command git fsck --full:

Checking object directories: 100% (256/256), done.
Checking objects: 100% (173/173), done.
broken link from    tree 5f9cd50454c698571ef96c2f037773890bde4a23
              to    blob 5e76c750eeaeb2b07542d07dc3a925999e8aaa2f
broken link from    tree 5f9cd50454c698571ef96c2f037773890bde4a23
              to    blob 63a23f89e10d31c3a9ed9ca582e83a8a05204b96
broken link from    tree 5f9cd50454c698571ef96c2f037773890bde4a23
              to    tree 41d648e8fd1281a8cdb1fcadb10daac5f7be2d39
broken link from    tree 5f9cd50454c698571ef96c2f037773890bde4a23
              to    blob 217069b72a199c7be5b4fd8ae45e0b93a1f36b85
broken link from    tree 8f79d8e655ff8fd4e6b4516f6fefb4bc8ba02c3c
              to    blob e50ee4159db60c92f144c4b33f45bd1a57ebd6ff
broken link from    tree d1281c46e3f1d36eea7d8e5be3b4cdd2a120a2fd
              to    blob fcaca083555ace75f54451698da96f98688a4709
broken link from    tree d1281c46e3f1d36eea7d8e5be3b4cdd2a120a2fd
              to    blob 7d1296959b427b97cf52812566b74fd38da0e0ae
broken link from    tree f8b71bc6088fb8f8422a40a5b212b3cabd8ddb66
              to    blob e8567005d36f967f7d852313b82767c04ab155a8
broken link from    tree f8b71bc6088fb8f8422a40a5b212b3cabd8ddb66
              to    blob f20377b3d16a0e580834e0438954cf9bba291fdf
broken link from    tree f70049a236deb0e215d5483fe9df3a0cbba88ceb
              to    blob e7280e8dac5c31d348f093a33172b2282149b064
broken link from    tree f70049a236deb0e215d5483fe9df3a0cbba88ceb
              to    blob 04f62a6068ea0cf89229a0c7579df0f2c0aa7ed1
broken link from    tree b5d32fbe8e6c2832ddd37625370e221837a176c8
              to    blob 55aa349b8516fb9a5b4d71a1c34ca8c1b1715e34
broken link from    tree b5d32fbe8e6c2832ddd37625370e221837a176c8
              to    blob 536346328a8d35ab7b62cf4b43db3bb307064f24
broken link from    tree a1b26b4af971808beec0e8c2fb4797d7ef6cca42
              to    blob 1f48f7a7a8c10175c8b0d0814f091ce4247cfd24
broken link from    tree a1b26b4af971808beec0e8c2fb4797d7ef6cca42
              to    blob df85d005e45adba5d9802b42d38ae07c00cf931a
broken link from    tree 3bdebb7bf8c1edbe618d7cc4eec1af2a35e89c31
              to    blob a82bcd60f583bdd9165b048bfc91df1b6a60eb22
broken link from    tree 3bdebb7bf8c1edbe618d7cc4eec1af2a35e89c31
              to    blob c55ee0ed8d8bcc7ad581fdfeca14e1daa5a86f8a
missing blob 04f62a6068ea0cf89229a0c7579df0f2c0aa7ed1
missing blob 1f48f7a7a8c10175c8b0d0814f091ce4247cfd24
missing blob 217069b72a199c7be5b4fd8ae45e0b93a1f36b85
dangling commit 317044625dfc11b9757ca4807e228c09eb5eb6e8
missing tree 41d648e8fd1281a8cdb1fcadb10daac5f7be2d39
missing blob 55aa349b8516fb9a5b4d71a1c34ca8c1b1715e34
missing blob 5e76c750eeaeb2b07542d07dc3a925999e8aaa2f
missing blob 63a23f89e10d31c3a9ed9ca582e83a8a05204b96
dangling commit 6bb4969cfc01bd1741b86c6f45c310fe766879ad
missing blob 7d1296959b427b97cf52812566b74fd38da0e0ae
missing blob c55ee0ed8d8bcc7ad581fdfeca14e1daa5a86f8a
missing blob e50ee4159db60c92f144c4b33f45bd1a57ebd6ff
missing blob e7280e8dac5c31d348f093a33172b2282149b064
missing blob e8567005d36f967f7d852313b82767c04ab155a8
missing blob fcaca083555ace75f54451698da96f98688a4709
missing blob 536346328a8d35ab7b62cf4b43db3bb307064f24
dangling commit 6313128d5b4b49f0b2900ebbe21dc17eea708c25
missing blob a82bcd60f583bdd9165b048bfc91df1b6a60eb22
missing blob df85d005e45adba5d9802b42d38ae07c00cf931a
missing blob f20377b3d16a0e580834e0438954cf9bba291fdf

Solution

  • You're getting deep into the implementation details of Git here. It's useful to know that:

    1. Git doesn't store files, at least not exactly. Git stores objects, which come in four types. You normally deal directly with only one of these types, the commit object, which you do when you run git log and see commit 8cd5a029c1ecd7523572d70f56f2aa93ad95eacd or whatever, and copy-paste that ID.

    2. The four types are blob, commit, tag (annotated tag), and tree.

    3. Each object has a unique hash ID, which is a cryptographic checksum of that object's content. Git can verify whether an object is valid or not by comparing the checksum produced by reading and re-summing the object.

    4. Commit, tag, and tree objects have a prescribed format, which git fsck can also check:

      • Tag objects contain the annotated tag object, the target object type, and the target object hash ID. (The target can be any one of the four object types, but it's most common to have a commit here, which makes this an annotated tag for that particular commit.)

      • Commits must have one tree line, which gives the hash ID of the top level tree object that represents the files that would be obtained, if you were to git checkout that particular commit. Commit objects may also have one or more parent lines, which give hash IDs of the parent commit(s).

      • Tree objects are lists of mode-hash-name tuples; the modes are from a specific (and relatively small) set of constants and the given hash IDs are those of further objects, while the name is a component name that's largely unrestricted. If a tree entry (one of the tuples in this list) is itself a tree, the name components will be joined with a forward slash.

      • Blob objects represent file content.

    When we put all of these together, we find that a commit might hold a tree with name x that lists a sub-tree that lists a blob named y, which means this commit, when extracted, will have a file named x/y (that's the file's name, complete with forward slash) whose contents are the given blob object.

    The git fsck command (especially if used with --full, which is supposedly the default but doesn't seem to actually be the default) will check all of these requirements, along with some more I haven't specifically listed. The output from your git fsck indicates that some objects have gone missing.

    The good news is that nothing is reported as corrupt, meaning there are no objects that are invalid. The bad news is that the missing blobs—it's all blob objects that are missing here—mean that those file contents are simply non-existent anywhere. This is not nearly as bad as if you had corrupt tree and/or commit objects and/or missing trees and/or commits.

    I ... realized that somehow I can restore them, through backups, etc.

    That's correct. If you have any other copy of the repository—in a local backup, in another clone made somewhere else, or anything along those lines—you can use that copy to search for those particular missing objects.

    Simply take that repository (restored onto your local disk somewhere) and run:

    git cat-file -t 04f62a6068ea0cf89229a0c7579df0f2c0aa7ed1
    

    for instance to see if 04f62a6068ea0cf89229a0c7579df0f2c0aa7ed1 is found and has type blob. If so, that's the missing 04f62a6068ea0cf89229a0c7579df0f2c0aa7ed1 you're looking for: the hash ID uniquely identifies the content, and is the content for the file-content that has gone missing. If that backup doesn't have that file-content, that blob ID will not be valid in that repository either.

    Now, actually recovering those contents, into your currently-broken repository, is a little tricky. The reason is that Git's objects can be stored in two ways: "loose" or "packed". A "loose" object is one stored on its own in a separate OS file. These are easy to copy from one repository to another directly. A "packed" object, however, is stored in a pack file and these are much harder to restore one-at-a-time.

    There's a relatively simple method you can use to work around all these details: read the object from the repository that has it, and write it to the repository that lacks it. You can do that with the slightly magic sequence:

    (cd goodrepo && git cat-file -p <hash>) | (cd badrepo && git hash-object -w -t blob --stdin)
    

    (in sh/bash syntax and making assumptions about the locations of the "good" and "bad" repositories). Repeat for all the "missing" blob objects and you'll have your repository restored, provided of course that you can in fact find all the objects in some backup(s).

    Most people don't do this, because Git is distributed. That is, there is usually a "good" copy of the repository somewhere else (e.g., on GitHub or Bitbucket). You simply re-clone, throwing away the "bad" copy, maybe losing some hours or a day's worth of work or so but spending less time than you would restoring multiple backups and fishing through them. As a result, there are no built-in Git operations to recover these things.

    For more heavily damaged, but more valuable, repositories, there are a lot more tricks that can be used, but overall it would be nice if Git had a git fetch --repair mode or something along these lines. It doesn't.