Search code examples
gitgit-rewrite-historyrepair

repair connection in git repo and merge with old repo to preserve history


PROBLEM

I have two code repositorys "REPO A" and "REPO B" which belong to the same project. In fact, REPO B is a continuation of REPO A and thus should include REPO A. I can not push from my local repository REPO B back to my original (bare) repository REPO A as there seems to be a structural error. When calling git log in REPO B, I get:

error: Could not read 3c4168d
fatal: Failed to traverse parents of commit 3d8c67a

BACKGROUND

In 2016 I started pushing to "REPO A", which is located on a Raspberry Pi Linux computer as a bare repository. Since 2018 I didn't push to REPO A anymore, as I was the only developer in the project and prefered to just commit to my local clone "REPO B". Now, after three years, I want to continue pushing to REPO A, but encounter a problem when trying so. Here is the structure of the two REPOs:

                                                        +-----  HEAD -> master of REPO B
                                                        |
                                                        |
                                                        v
                                                    +---------+
                                           d784821  |         |  latest commit
                                                    +---------+  20 Mar 2021
                                                         |
                                                    +---------+
                                           bcc1186  |         |  commit
                                                    +---------+  14 Dec 2020
                                                         |

                                                        ...

                                                         |
                                                    +---------+
REPO A                                     86dea25  |         |  commit
                                                    +---------+  8 Nov 2018
HEAD -> master                                           |
origin/master   -----+                              +---------+
origin/HEAD          |                     f5ea2e3  |         |  commit
                     |                              +---------+  7 Apr 2018
                     |                                   |
                     |                              +~~~~~~~~~+
                     |                              |    ?    |  -> error: Could not read 3c4168d
                     |                              +~~~~~~~~~+  -> fatal: Failed to traverse
                     v                                   |                 parents of commit 3d8c67a
                +---------+                         +~~~~~~~~~+
        commit  |         |  3c4168d       =        |         |
   13 Mar 2018  +---------+                         +~~~~~~~~~+
                     |                                   |

                    ...                    =            ...

                     |                                   |
                +---------+                         +~~~~~~~~~+
        commit  |         |  7ad262b       =        |         |
    2 Aug 2016  +---------+                         +~~~~~~~~~+
                     |                                   |
                +---------+                         +~~~~~~~~~+
initial commit  |         |  09b9c4d       =        |         |
    2 Aug 2016  +---------+                         +~~~~~~~~~+

                   REPO A                              REPO B

REASONING

The lower part of the two REPOs should be identical, as REPO B includes the continuation of REPO A. The HASH value of 3c4168d, which belongs to the latest commit of REPO A is the same HASH value which is seen in the error when trying to git log REPO B. So, somehow the downward connection after the oldest valid commit f5ea2e3 of REPO B got lost.

When trying to open REPO B using SourceTree, it refuses to setup the project showing the error message:

error code 128: refs/remotes/GitPi/master does not point to a valid object!
error: Could not read 3c4168d...
fatal: revision walk setup failed

But commits to REPO B are still working using the command line.

QUESTIONS

How can I "repair" REPO B so that I get back my currently missing commit history (which is still validly located in REPO A? How can I revive my ability to log and push from REPO B to REPO A, again?

I see, there are a bunch of posts concerned with glueing two repos and repairing the commit history, but after some hours of reading, I could not figure it out for my specific case.

Thanks for any hints.


Solution

  • After a whole day of reading and trying, the problem could be solved ;-) . Here is, how it was performed. Hope, it will be helpful to somebody:

    CONCEPT

    1. repair the missing link in the commit chain by creating a new "recovery commit" at the tip of REPO A, then
    2. create patches of each of the valid commits of REPO B, then
    3. apply the patches to REPO A. patches don't look at blobs but just contain changed code snippets. thus making it easy to "replay" commits.

    REPAIR MISSING LINK BETWEEN REPO A and B

    to find a link between the two valid commits f5ea2e3 (Repo B) and 3c4168d

    cd RepoB
    git show <sha-of-broken-commit>
    
    • commit 3d8c67a is broken and does not find its parent
    • but the objects might be ok

    the above command gives the sha of the tree object (cf. folder) to look for further blob objects (files). with that, try to recover as many information as needed from the broken commit 3d8c67a in order to create a valid commit 3c4168d on top of REPO A which smoothly fits to the changes imposed by commit f5ea2e3 of REPO B.

    git cat-file -p <sha-of-tree-object>
    
    • respective tree object does exist and is valid
    • respective blob objects of files changed within f5ea2e3 exist and are valid

    now, recover the files belonging to the broken commit 3d8c67a by using Python and zlib package

    import zlib
    fname = r'<path-to-object-within-object-folder-below-git-management-folder>'
    compressed = open(fname, 'rb').read()
    decompressed=zlib.decompress(compressed)
    fh = open("./<filename>",'w')
    fh.write(decompressed.decode('utf-8'))
    fh.close()
    

    CREATE PATCHES FROM REPO B

    create patches in REPO B from commit f5ea2e3 until the tip of the repo.

    cd ../RepoB
    git format-patch f5ea2e3 -o _patches/
    

    APPLY PATCHES IN REPO A

    apply all patches to the tip of REPO A (on top of the new "recovery commit")

    cd ../RepoA
    cat ../RepoB/_patches/*.patch | git am