Search code examples
gitgit-rewrite-history

How to merge several Git repos into one and interleave histories


My situation is that I have two Git repositories that I need to merge into a single repository (there are actually more repos, but I can start with two).

The two repositories are:

  • The main repository, A.
  • The second repository, B.

The code in repository B has dependencies on the code in repository A (but not vice versa), and the histories of both repositories follow each other in a chronological fashion - roughly (i.e. a specific commit in repo B will typically require a commit from repo A with a very similar commit time).

There are conflicting branch and tag names in both repositories (there are no guarantees that they belong together), but only the refs from A need to be preserved.

The requirements for the new repository, C, are:

  1. All refs (branches and tags) from A need to be preserved.
  2. Only the master branch commits from B need to be preserved (i.e. the commits that are reported by git log --first-parent master).
  3. The files from each source repository should be put into subfolders of the new repository (i.e. the files from A shall go into A/, and the files form B shall go into B/).
  4. When checking out a specific commit (including commits done before the merge) in repository C (e.g. a release tag) compatible files form both source repositories should be found in the directories A/ and B/ (at least within a commit or two).

So far I have tried several approaches, including this and git-stitch-repo, without success (they did not fulfill the above requirements).

At this point, I have managed to:

  • Move all files in each repo to a subdirectory using git filter-branch. E.g. for repo A:
mkdir A
mv * .gitignore A/ 2> /dev/null
git commit -a -m 'DROPME' > /dev/null
git filter-branch --tag-name-filter cat --index-filter 'git ls-files -s | sed "s-\t\"*-&A/-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" ||:' -- --all
git reset --hard origin/master
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
  • Import repo B into A using git fast-export/fast-import.
  • Device a method for generating a mapping such that for a given SHA in A, there is a list of zero, one or more SHA:s that should be inserted from B.

What I would expect now, is that some clever usage of git filter-branch should enable me to insert the selected commits from B into the master branch of A. But how?


Solution

  • The solution turned out to be much more involved than I had hoped for. It involves manipulating and combining the output of two (or more) git fast-export streams, and importing them into a new repository using git fast-import.

    In short, a new fast-import stream is generated by traversing two input streams, and switching back-and-forth between them based on a date-sorted log from the main branches.

    I have implemented the solution in a Python script called join-git-repos.py, that I put in a GitHub repository here.