Search code examples
gitgit-subtreegit-filter-branch

Intense Restructure of large Git repo into multiple new repos


I've found several simple examples using both filter-branch and subtree, but they always are just moving 1 directory around. I'd like to take the following repo:

/
  Project1.sln
  Project2.sln
  Source/
    CommonLib.Data/
    CommonLib.Web/
    Project1.Data/
    Project1.Web/
    Project1.Other/
    Project2.Data/
    Project2.Web/

And move things out to their own repos, with the following structure:

# CommonRepo
/
  CommonLib.Data/
  CommonLib.Web/

# Project1Repo
/
  Project1.sln
  Project1.Data/
  Project1.Web/
  Project1.Other/

# Project2Repo
/
  Project2.sln
  Project2.Data/
  Project2.Web/

While maintaining the entire history. To complicate things, there are 1 or more branches of the original repo that correspond to each project, and thus the version of CommonLib the other projects referred to may be slightly different.

I'd like to use git subtree add to add a reference back to the CommonLib in each of the new repos at the correct tag/revision, but first I need a way to split several directories at once off into their own location.

git subtree split -P seems to only want 1 directory, and I haven't been able to get filter-branch to grab the multiples, either. I'm on a windows box so don't have all the scripting niceties set up to make this easier.

Any advice?


Solution

  • In the end, I recommend you keep the common lib included in your projects especially due to the divergence you spoke about, so your ideal structure should be:

    # CommonRepo
    /
      CommonLib.Data/
      CommonLib.Web/
    
    # Project1Repo
    /
      Project1.sln
      Project1.Data/
      Project1.Web/
      Project1.Other/
      CommonLib/         # I recommend that you do whatever restructuring needed to support this in a sub-directory
        CommonLib.Data/
        CommonLib.Web/
    
    # Project2Repo
    /
      Project2.sln
      Project2.Data/
      Project2.Web/
      CommonLib/         # I recommend that you do whatever restructuring needed to support this in a sub-directory
        CommonLib.Data/
        CommonLib.Web/
    

    Now to handle the splitting:

    when you split, as long as you don't use different annotations or something the commit ids will be compatible and should play nicely with merge. So you can start by extracting the CommonLib by itself.

    1. I recommend you clone your whole depo before starting just to be sure you don't lose anything.

      git clone <big-repo> <big-repo-clone>
      
    2. Prepare the old repo

      pushd <big-repo-clone>
      # split for the common lib
      git checkout master  # assuming you want your common lib at master
      git subtree split --prefix=Source --branch=temp-commonLib
      
      # split the projects from their respective branches
      git checkout <branch-for-project1>
      git subtree split --prefix=Source --branch=temp-project1
      
      # split the projects from their respective branches
      git checkout <branch-for-project2>
      git subtree split --prefix=Source --branch=temp-project2
      
    3. Now we need to clean out the parts of those projects that we don't want there. Since they're mixed in you can't really use sub-tree but you can filter-branch to rewrite the history without the other parts.

      # strip unrelated parts from the CommonLib
      git checkout temp-commonLib
      git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch Project1* Project2*' HEAD
      
      # strip unrelated parts from the Project1
      git checkout temp-project1
      git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch CommonLib* Project2*' HEAD
      
      # strip unrelated parts from the Project2
      git checkout temp-project2
      git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch CommonLib* Project1*' HEAD
      

      The prune empty will strip the commits that become empty because they only contained changes that were in the folders you removed.

      Note: All of these changes are at the /source level so that it can be the new root for each project. You can later add your solution back in. Or you can use this prune technique with clones instead of subtrees, and when you're all done you can just move all the contents from '/Source' to '/'

      Now your is going to have extra branches and backups in refs/original/refs/heads/<branch-name>. If during the process you get a fatal error with filter-branch, you can re-create the branch and start again, or if you're confident it didn't do anything yet you can delete this backup with: git update-ref -d refs/original/refs/heads/<branch-name>.

    4. Now just create new repos to store the projects created from those branches

      popd # to get out of <big-repo-clone>
      
      mkdir <new-repo>
      pushd <new-repo>
      
      git init
      git pull <big-repo-clone> <name-of-branch> # like temp-project1
      popd # to get out of the <new-repo>
      
    5. One last thing, lets pull the CommonRepo into the projects.

      pushd <new-project-repo>
      git subtree add --prefix=CommonLib <new-commonlib-repo>
      

    You then just need to bring in the .sln files (I'll leave this last step up to you).