Search code examples
gitgit-rebase

Combine two branches from two repos with disjoint contents, preserving history in a single timestamp-collated branch


I don't know git very well. :-/

Background

I have two unrelated git-based document repositories that I would like to combine into a single repository. I would like to preserve the original timestamps (dating back to 2005) and individual file histories. The two repos contain no branches, no folders, and there is no overlap in terms of file naming.

In ASCII-land, it looks like this:

REPO A    |-------------------------|
REPO B                    |===============|

Where the overlap denotes time.

Goal

My goal is to "zipper up" the overlapping timestamps so that the two repos look like a single, unbroken history:

REPO A+B  |-------------------==--=---============|

What I've Tried

Again, I don't know git very well, so I could have screwed something up.

First I tried to add the newer, smaller repo as a remote for the larger, older repo, fetch the changes, and commit the result. I ended with all the new repo changes lumped together in a branch after the older repo:

MERGE  |-------------------------                 -|
                                 \===============/

Next I tried rebasing (with --committer-date-is-author-date), which I thought would work, but instead I end up with one long commit history that just stacks the two repos on top of each other.

REBASE |-------------------------===============|

I haven't been able to find a way to "replay" the combined history. I was really hoping rebase would be the answer.

Answers I've Looked At


Solution

  • While @codeWizard's reply was helpful, that approach didn't retain the timestamps the way I wanted. It did lead me down a rabbit hole that helped me find a solution though...

    1. Create a new, blank repository

      git init
      
    2. Add and fetch the old repositories as remotes

      git remote add -f oldRepoA ../oldRepoA
      git remote add -f oldRepoB ../oldRepoB
      
    3. Export the combined commit history by timestamp and hash, pipe the output to sort, discard the timestamps via cut, and then pipe the list of chronologically sorted hashes to xargs, which runs a shell script to export a patch for each individual hash and then immediately apply the patch to the new repo.

      git log --all --oneline --format="%at %H" | sort | cut -c12- | 
          xargs -I {} sh -c 
              'git format-patch -1 {} --stdout | 
               git am --committer-date-is-author-date'
      

    The --committer-date-is-author-date is key to keeping the original timestamps. There might be a better way of doing this, but this works well enough for my use case!