Search code examples
gitgit-filter-branchatlassian-fisheye

How to remove an entry with null sha1 in a Git tree


I have inherited of a git repository with a null sha1 for a commit entry in a tree, preventing FishEye to index the repository.

$ git fsck
Checking object directoriies: 100%(256/256), done.
warning in tree db22a67df70dc4ff90ec4cd666da91e9c2cb0d9:
    contains entries pointing to null sha1
Checking objects: 100% (416532/416532), done.
Checking connectivity: 416532, done.

Looking for the given tree give me the following result:

$ git ls-tree db22a6
100644 blob e615f18b55a39f2719112ce209c2505dd92d8e75    .gitignore
100644 blob ac852f06c5a04420356c1d5efca44d9a864e78b0    .project
160000 commit 0000000000000000000000000000000000000000  SomeDirectory
100644 blob 631c17e28026261a2ccf6bc570842cf4af9f181c    GoDeploy.bat
100644 blob 40e992ab5c3868af2910135c3ac4610c3646e7f8    pom.xml

Looking in the history, I've found that SomeDirectory was initially a git submodule and that the commit that seems to cause the issue is the one that removed both the .gitmodules and SomeDirectory. Now, there is a real directory called SomeDirectory at the exact same place where the culprit was.
I though I could still try to fix run a git filter-branch to see what I would end up, but it does not work:

$ git filter-branch --force --index-filter \
$ 'git rm --cached --ignore-unmatch SomeDirectory' \
$ --prune-empty --tag-name-filter cat -- --all
[... striped out for clarity]
Rewrite c571a3ec94e9f84471577bac41ac7375c729ef08 (76/18522)error:
    cache enttry has null sha1: SomeDirectory
fatal: unable to write new index file
Could not initialize the index
[... striped out for clarity]

What am I supposed to try next, knowing that there is no backup that I'm aware of prior to the commit that causes the issue.


Solution

  • The message you get suggests that there was only a single tree with a bad submodule. In that case, there is very little you have to clean up. You can create a new fixed tree that doesn't have this problem:

    $ git ls-tree db22a67df70dc4ff90ec4cd666da91e9c2cb0d9 |
    > sed -e '/0\{40\}/d' |
    > git mktree
    (new tree SHA1 here)
    

    Your question shows the git ls-tree output already. The sed removes the line with the bad submodule, and git mktree creates a new tree object from the result.

    Once you have the fixed tree, you can create a fixed commit using this tree:

    $ git cat-file commit c571a3ec94e9f84471577bac41ac7375c729ef08 |
    > sed 's/db22a67df70dc4ff90ec4cd666da91e9c2cb0d9/(new tree SHA1 here)/' |
    > git hash-object -t commit -w --stdin
    (new commit SHA1 here)
    

    git cat-file commit c571a3ec94e9f84471577bac41ac7375c729ef08 prints the problematic commit object in a textual form. It will start with tree db22a67df70dc4ff90ec4cd666da91e9c2cb0d9, and continues with the rest of the commit info (parent, author, committer, commit message). The sed replaces the tree line's reference to the old tree by the new one. git hash-object -t commit -w --stdin creates a new commit object from the result, writes it to the repository, and prints its ID.

    Once you have the fixed commit, you can use git replace:

    $ git replace c571a3ec94e9f84471577bac41ac7375c729ef08 (new commit SHA1 here)
    

    This doesn't actually change anything yet, but tells Git that whenever it would read commit c571a3ec94e9f84471577bac41ac7375c729ef08, it should read the new commit object instead.

    And finally, use git filter-branch to make it permanent. This goes through all commits, reads them, and writes them back. Ordinarily, without any options to modify the commits, this wouldn't have much effect, but because of the earlier git replace, this causes all commits with c571a3ec94e9f84471577bac41ac7375c729ef08 as a parent to be re-written to refer to the new commit instead, all commits which refer to those re-written as well, etc.