Search code examples
gitsvnsubgit

Convert svn to git across repository restructuring using SubGit


I'm working on migrating a Subversion repository to Git using SubGit 2.0.3 while trying to maintain the full history across a restructuring. I've got a configuration that seems to maintain history across the restructuring for the branches, but not the trunk.

The restructuring itself was a bit...unusual...and involved an intermediate layout.

Initial layout:

  • trunk: /ProjectOldName
  • branches: /ProjectOldName/Releases
  • tags: N/A

Intermediate layout:

  • trunk: /trunk/ProjectNewName
  • branches: /releases/ProjectNewName
  • tags: N/A

Final layout:

  • trunk: /ProjectNewName/trunk
  • branches: /ProjectNewName/branches/releases
  • tags: /ProjectNewName/tags

So the subgit mappings I used for the conversion were:

trunk = ProjectNewName/trunk:refs/heads/master
branches = trunk/ProjectNewName:refs/heads/old-master-interim
branches = ProjectOldName:refs/heads/old-master
branches = ProjectNewName/branches/releases/*:refs/heads/releases/*
branches = releases/ProjectNewName/*:refs/heads/old-releases-interim/*
branches = ProjectOldName/Releases/*:refs/heads/old-releases/*
tags = ProjectNewName/tags/*:refs/tags/*
shelves = ProjectNewName/shelves/*:refs/shelves/*

This maintained history for the release branches, the log for a file would go beyond the restructuring...although it seemed to stop at the creation of the branch (which occurred before the restructuring). The history for the same file on master however stopped at the creation of the final step of the restructuring and the expected 'old-master-interim' and 'old-master' branches didn't exist in the git repository.

It looks like the restructuring was done using svn copies (ie. they didn't manually copy the files and re-commit them) and history on the final layout was preserved correctly. The intermediate layout was created twice though, the first attempt got deleted with a comment indicating that history wasn't preserved. So as best as I can tell the chain of restructuring commits went (for trunk):

  • Start off with /ProjectOldName
  • Add directory /trunk/ProjectNewName
  • Add multiple directories (most, but not all) to /trunk/ProjectNewName/ from /ProjectOldName/, with deletes for those directories not added (how this happened I'm not sure as the directories didn't exist in that branch yet)
  • Replace multiple directories (same set as added above) in /trunk/ProejctNewName/ from /ProjectOldName/ (with slightly different revisions, maybe an attempt to redo the previous add?)
  • Delete directory /trunk/ProjectNewName (with a comment about history not being saved)
  • Add directory /trunk/ProjectNewName (for the second time)
  • Add multiple directories to /trunk/ProjectNewName/ from /ProjectOldName/, same set of directories again but this time the deletes weren't present
  • Add directory /ProjectNewName/trunk
  • Add multiple directories to /ProjectNewName/trunk/ from /trunk/ProjectNewName/
  • Delete directory /trunk/ProjectNewName

It's similar, but slightly different for the release branches:

  • Start off with /ProjectOldName/Releases
  • Add directory /releases
  • Add multiple directories (one for each branch) to /releases/ from /ProjectOldName/Releases/
  • Delete directory /releases
  • Add directory /releases/ProjectNewName
  • Add multiple directories (one for each branch) to /releases/ProjectNewName/ from /ProjectOldName/Releases/
  • Add directory /ProjectNewName/branches/releases
  • Add multiple directories (one for each branch) to /ProjectNewName/branches/releases/ from /releases/ProjectNewName/
  • Delete directory /releases/ProjectNewName

The only real difference seems to be the 'Replace multiple directories' step that happened for trunk but not the branches.

So after all that:

  • Is there a way to get SubGit to convert the above while maintaining history across the restructuring for trunk?
  • Can SubGit handle having branches under the trunk as in the original repository layout (ie. trunk at /OldProjectName, branches at /OldProjectName/Releases)?
  • Is there anything special about the 'trunk' mapping? Or is it actually no different than the 'branches' mapping? AFAIK for both svn and git there's nothing special about the 'trunk' directory and the 'master' branch respectively.
  • Even though history on the branches seem to cross the restructuring OK, they stop at the branch creation instead of continuing on to where it was branched from. What would cause this and how can it be fixed (if it can)?

Solution

  • Is there a way to get SubGit to convert the above while maintaining history across the restructuring for trunk?

    SubGit is able to track branch history when the whole branch directory gets copied from one location to another:

    $ svn cp ^/trunk ^/branches/foo
    

    However, it's impossible to track the history when some of branch subdirectories were copied:

    $ svn add ^/branches/foo
    $ svn cp ^/trunk/dir1 ^/branches/foo/dir1
    $ svn cp ^/trunk/dir2 ^/branches/foo/dir2
    ...
    $ svn cp ^/trunk/dirN ^/branches/foo/dirN
    

    Unfortunately this is how restructuring was performed for ProjectOldName, /trunk/ProjectNewName and /ProjectNewName/trunk directories. As result SubGit is not able to preserve the history for them.

    One possible workaround in your case is importing those directories into separate branches and then grafting imported pieces into one single history with git-replace.

    This workaround, however, leads to the next question:

    Can SubGit handle having branches under the trunk as in the original repository layout (ie. trunk at /OldProjectName, branches at /OldProjectName/Releases)?

    No, SubGit ignores OldProjectName directory in this case.

    We made it intentionally: if SubGit would try importing OldProjectName directory, any revision that adds a branch to OldProjectName/Releases would take a lot of time as SubGit treats it as a brand new directory.

    In order to graft OldProjectName history to other branches I'd recommend importing that branch separately:

    $ subgit configure --svn-url URL REPO
    $ git config -f REPO/subgit/config svn.trunk OldProjectName:refs/heads/master
    $ subgit import REPO
    

    After that you can fetch imported changes to Git repository imported with the settings you already mentioned and then use git replace to join the histories of ProjectOldName, /trunk/ProjectNewName and /ProjectNewName/trunk.

    Even though history on the branches seem to cross the restructuring OK, they stop at the branch creation instead of continuing on to where it was branched from. What would cause this and how can it be fixed (if it can)?

    I believe this is caused by the previous problem: since ProjectOldName directory is ignored, SubGit is not able to preserve the history of the branches copied as follows:

    $ svn cp ^/ProjectOldName ^/ProjectOldName/Releases/BRANCH
    

    Unfortunately, that means you can choose importing of ProjectOldName or ProjectOldName/Releases/* but not both. Again using git replace may help here by grafting branches' history.

    Is there anything special about the 'trunk' mapping? Or is it actually no different than the 'branches' mapping? AFAIK for both svn and git there's nothing special about the 'trunk' directory and the 'master' branch respectively.

    The difference between trunk and branches configuration options is effective during Git to SVN import only. While importing Git history to SVN, SubGit makes sure that branch specified as trunk gets created from the very first revisions and never gets deleted or replaced. Branches specified as branches tend to have shorter lifetime in imported SVN history.

    There is no difference between trunk and branches if you import SVN history to Git.

    Warning:
    You should never use use git replace command in case you're going to keep Git and SVN repositories in sync rather than perform one-time import.

    Thank you for providing all the necessary details in your question. Hopefully, my answer is helpful enough.