Search code examples
gitgit-rebasegit-cherry-pickgit-workflow

Recover git history from legacy version of project


Long story short:

  1. Project is getting migrated to new git repository server
  2. Someone only copies project files and pushes whole project to new server as initial commit
  3. Work continues starting at that new initial commit for quite some time
  4. I managed to find an old local copy of legacy project (before the switch to new server) and want to insert the old git history into the current version of the project (before the start of the current history). There are a few extra unwanted commits on the old local project

The branches essentially look like this:

                 old_master
                /
A--B--C--D--E--F

                  origin/new_master
                 /
init--G--H--I--J

where commit: new_master -> init = old_master -> D

So the end result would be something like:

                       origin/new_master
                      /
A--B--C--D--G--H--I--J

How to rebase commits from another repository with a different history? has a similar dilemma history wise that is solved with cherry picking. In my situation there are a huge amount of commits with complex branching that might be difficult to cherry pick. Is there an efficient way to do this using rebase or rebase --onto?


Solution

  • Is there an efficient way to do this using rebase or rebase --onto?

    Not in the general case, where there might be branching-and-merging. (If the history in the new repository is strictly linear, then you can do this with a simple git rebase --onto. It's not exactly efficient but it's just machine time, so who cares how efficient it is?)

    The general solution to this is a graft, via git replace.

    Let's look at what happens if you git fetch both the original and the new repositories into a third (otherwise totally empty) repository, using the drawings above. You end up with:

    A--B--C--D--E--F   <-- old/master
    
    D'--G--H--I--J   <-- new/master
    

    (note that the third repository does not yet have its own master). Rather than calling the first commit in the new/master chain init, I've called it D' because presumably it has the same snapshot as commit D in old/master, but it has a different hash.

    Nothing—no power on Earth—can change any of these existing commits. But what if we copy commit G to a new commit G' whose parent is D? Then we get this:

    A--B--C--D--E--F   <-- old/master
              \
               G'
    
           D'--G--H--I--J   <-- new/master
    

    At the moment, new commit G' is just hanging out in the repository, with no way for us to find it. Let's add a name by which we can find G'. For now, let's call it graft:

    A--B--C--D--E--F   <-- old/master
              \
               G'  <-- graft
    
           D'--G--H--I--J   <-- new/master
    

    Now, what if we could somehow get Git, when it's walking backwards along the J-then-I-then-H-then-G-then-D' (and then stop) chain, to, at the very last moment it can, switch from G to its graft G'? That is, we'll make a dotted-line connection of some sort:

    A--B--C--D--E--F   <-- old/master
              \
               G'  <-- graft
               :
           D'--G--H--I--J   <-- new/master
    

    and convince Git to run git log as show J then I then H then G' then D then C then B then A.

    It will now look like history reads this way, even though it doesn't, really.1

    This is precisely what git replace does. It makes replacement objects. In the case of a commit, the replacement can take the form of a graft, like G'. Rather than using the magic name graft, Git uses an even-more-magic name, refs/replace/hash, where hash is the hash ID of actual commit G. Some of the time, you don't need to know this, and some of the time, you do.

    The problem with this kind of replacement-commit graft is that git clone doesn't clone replacements, by default.2 So your third repository is a little weird when cloning. Sometimes that's exactly what you want, and if so, that's fine. Sometimes it's not, and if so, consider using git filter-branch or similar to convert the graft to a fourth repository in which the graft is now permanent, because the commits are copied to new commits (with new and different hash IDs) in which the real—but rewritten—history uses the grafted history, rather than the original history.


    1Philosophical question: Or does it? Does history read the way history is, or does history read the way Git reads it out to you?

    2This is, in effect, Git's answer to the philosophical question in footnote 1. History reads the way Git reads it to you, but is recorded as the real history. On cloning, Git clones the real history, and ignores the graft. You can, subsequent to cloning, ask Git to copy grafts too, but that's not the default.

    Besides this, you can run git --no-replace-objects log and see the real history, and when looking at grafted history, an optional decoration marks each graft, so that you can see it if you look closely.