Search code examples
gitgit-merge

Merge two similar Git repositories into new repo


I need to merge two Git repos, Repo1 and Repo2, into a new single repository MonoRepo. The two repos have almost identical folder structures, and many files are very similar or even the same between the repos (the ultimate goal here is to unify and de-duplicate code). If that matters, both were originally created as branches in SVN, with Repo2 initially a copy of Repo1.

To merge them keeping the commit history, I am following 2 examples: example 1 and example 2.

The basic steps I follow are:

  1. Prepare Repo1 and Repo2 by creating top-level folders ("Repo1" and "Repo2", respectively) and moving everything into them with git mv. This ensures that the repos have parallel folder structures without any overlap. Thus, Repo1's folder structure is transformed from:

    dir1/
        file1
    dir2/
        file2
    ...
    

    to:

    Repo1/
        dir1/
            file1
        dir2/
            file2
        ...
    

    and similarly Repo2 has "Repo2" as the top folder.

  2. Create empty MonoRepo

  3. Bring Repo1 and Repo2 in as remotes:

    git remote add -f Repo1 ../Repo1
    git remote add -f Repo2 ../Repo2
    
  4. Merge them in using the --allow-unrelated-histories flag:

    git merge Repo1/main --allow-unrelated-histories
    git merge Repo2/main --allow-unrelated-histories
    

The problem is that merging in Repo2 creates a ton of rename/delete, modify/delete and rename/rename conflicts, for example:

    CONFLICT (rename/rename): dir1/file1 renamed to Repo1/dir1/file1 in HEAD and to Repo2/dir1/file1 in Repo2/main.

Naively, I thought that since the folder structures of Repo1 and Repo2 are strictly non-overlapping, there should be no conflicts. However, it seems Git is keeping track of directory renaming, or is otherwise attempting to match directories that seem to have been renamed -- git merge output starts with:

    Performing inexact rename detection: 100% (1037700/1037700), done.

I am guessing this matching is particularly problematic in my case: the content of the repos is so similar that Git may think that many parts of Repo2 have been renamed from Repo1.

Based on a post "Renaming and Deep Directory Hierarchies in Git" I have tried 2 things to overcome the conflicts:

  1. Set the merge.directoryRenames setting from default ("conflict") to "true" which would seem to allow to "just have such files moved to the new directory" with git config --local merge.directoryRenames true

  2. Create an empty top-level "Repo1" folder in the Repo2 repo, so that Git does not think that "Repo2" is a rename of "Repo1":

    Repo1/
        .gitkeep
    Repo2/
        dir1/
            file1
        dir2/
            file2
        ...
    

Neither of these workarounds worked -- I am still getting hundreds of CONFLICT messages.

Any suggestions for this seemingly straightforward merge of two repos with parallel folder structures and similar content?

Thank you


Solution

  • If your goal is to bring the history of two repositories (I assume their master branch) into a third new repo, you could use a different approach that relies on merging with the -X subtree=<path> option. This flag allows you to prefix or strip a subfolder to make the trees match.

    Notice that before proceeding, it is important to disable the config merge.renames, as the second merge might still trigger some rename detections. This may occur if the second repository contains any file similar to the first repo, and with a similarity above a certain threshold (as of Git 2.48.0, the default similarity threshold is 50%). In this scenario, it is acceptable to turn off rename detection, as we are just combining two independent histories. Any conflict caused by a seemingly rename would only be a false positive.

    In your case, you could enter:

    # create repository monorepo
    mkdir monorepo
    cd monorepo
    git init
    
    # disable rename detection during meges
    git config --local merge.renames false
    
    # create mock files to track and inlcude the subfolder paths in the repo.
    # This step step is important to let subtree accept the paths.
    mkdir repo1
    mkdir repo2
    echo "readme repo1" >> repo1/README.md
    echo "readme repo2" >> repo2/README.md
    git add .
    git commit -m "init commit to include repo1 and repo2 paths in the repo"
    
    # add remote for repo1 and repo2 and fetch their master branch
    git remote add origin_repo1 <repo1>
    git fetch origin_repo1 master
    
    git remote add origin_repo2 <repo2>
    git fetch origin_repo2 master
    
    # merging repo1 into the subfolder repo1
    git merge -s ort -X subtree="repo1/" --allow-unrelated-histories origin_repo1/master
    
    # merging repo2 into the subfolder repo2
    git merge -s ort -X subtree="repo2/" --allow-unrelated-histories origin_repo2/master
    
    # remove merge.renames, or alternatively reset it to true
    git config --local --unset merge.renames
    # git config --local merge-renames true
    

    Another alternative, as suggested in the comments by @user2690051, would be to use the recursive strategy along with the strategy option no-renames to turn off rename detection.

    Before proceeding with this second approach, it is important to perform the following steps for both repositories: move the whole content of the working directory into a subfolder, and record a commit (the commit can be recorded on top of master or a temporary branch). In case of using a temporary branch, make sure to replace master with your temp branch when running the commands below.

    # within repo1 move everything into a new top-level folder "repo1"
    # record a commit and push to the remote
    # leave repo1
    
    # within repo2 move everything into a new top-level folder "repo2"
    # record a commit and push to the remote
    # leave repo2
    
    # create repository monorepo
    mkdir monorepo
    cd monorepo
    git init
    
    # create some files for an init commit
    # ... creating some files ...
    git add .
    git commit -m "init commit"
    
    # add remote for repo1 and repo2 and fetch their master branch
    git remote add origin_repo1 <repo1>
    git fetch origin_repo1 master
    
    git remote add origin_repo2 <repo2>
    git fetch origin_repo2 master
    
    # merging repo1 into monorepo
    git merge -s recursive -X no-renames --allow-unrelated-histories origin1/master
    
    # merging repo2 into monorepo
    git merge -s recursive -X no-renames --allow-unrelated-histories origin2/master