Search code examples
gitsvnversion-controlmergegit-svn

Git-svn migration with non-standard layout doesn't show merges


After trying several options and a bunch of hints from this site and others I'm stuck. My main question is the following: I'd like to migrate (part of) an SVN repository to Git, preserving history. The SVN layout is non-standard and after git svn clone I do see the right branches appear, but when I try to e.g. merge master into a branch, I get conflicts that say both added a set of files. If I take a look in e.g. gitg I see the branches, but they never seem to branch from master/trunk (so the "both added" conflicts seem logical from that perspective), nor do I see any of the merges (e.g. from trunk to a branch) in the graph (the commits are there, they just don't link to branches in the graphical display of gitg). In fact, for some branches I even see two identical commits one after the other (one for master, one for the branch). The way I created the branches in SVN was using svn copy.

Some more details:

Repository layout: A slightly simplified schematic of the SVN repo layout (the structure is the same, names are different, some directories have been omitted)

pkg
    Project1
    Project2
    Lib
branches
    Project1-feature1
        Project1
        Lib
    Project1-hotfix
        Project1
        Lib
    Lib-feature
tags
    Project1
        v0.1.0
        v0.2.0
            Project1
            Lib
    Project2
        v0.1.0

The Lib directory is closely associated with Project1, but also used by others. That is why I (starting with v0.2.0) created to Project1 and Lib subdirectory structure in the branches and tags.

My git svn workflow: This is the most promising command I used to clone the SVN repo:

git svn clone \             
    --prefix=svn/ \
    --trunk=pkg \
    --branches=branches \
    --tags=tags/Project1 \
    -A authors.txt \
    --ignore-paths='^pkg/(?!Project1|Lib)' \
    svn+ssh://user@svn.r-forge.r-project.org/svnroot/MyTool  SVN2GitMigration

The --ignore-paths option is there so that I keep only the two directories (Project and Lib) in which I'm interested. I do not filter on branches since there is only one branch not directly related to Project1.

After that I convert the remote branches to local branches (and remove the remote branches), then convert the tags to proper Git tags.

EDIT START: Closer inspection of the commits reveals that I have many empty commits. These turn out to be due to the --ignore-paths option: the empty commits are done in parts of the directory tree that are ignored. So this option doesn't really behave as I expected. Back to the drawing board... EDIT END

EDIT2 Actually, using git filter-branch --tag-name-filter cat --prune-empty -- --all I managed to remove the empty commits EDIT2 END

Possible cause of my merge problems: Branches/Tags are not single SVN commits because they first consist of a commit in which I create the branches/Project1-featureX directory, followed by two svn copy lines in which I copy the Project1 and Lib directories from trunk.

Suggestions on how to properly convert this SVN repo are very welcome! If, somehow this means loosing Lib that isn't a big deal. I'm planning to separate the two anyway once the migration has finished.


Solution

  • After a lot of trial and error I solved my problem in the following way:

    Preparation

    First I initialised a repository without any branches or tags:

    git svn init \
      --prefix=svn/ \
      --trunk=pkg/Project1 \
      svn+ssh://user@svn.r-forge.r-project.org/svnroot/MyTool \
      SVN2GitMigration
    

    Next I added the author information:

    cd SVN2GitMigration
    git config svn.authorsfile ../authors.txt
    

    After this, my .git/config file had the following contents:

    [core]
           repositoryformatversion = 0
           filemode = true
           bare = false
           logallrefupdates = true
    [svn-remote "svn"]
           url = svn+ssh://user@svn.r-forge.r-project.org/svnroot/MyTool \
           fetch = pkg/Project1:refs/remotes/svn/trunk
    [svn]
           authorsfile = ../authors.txt
    

    In order to get the branches and tags I changed that file to:

    [core]
           repositoryformatversion = 0
           filemode = true
           bare = false
           logallrefupdates = true
    [svn-remote "svn"]
           url = svn+ssh://user@svn.r-forge.r-project.org/svnroot/MyTool \
           fetch = pkg/Project1:refs/remotes/svn/trunk
           tags = tags/Project1/{v0.4.2,v0.4.1,v0.4.0,v0.3.0,v0.2.2,v0.2.0}/Project1:refs/remotes/svn/tags/*
           tags = tags/Project1/{v0.2.1,v0.1-9e,v0.1.3}:refs/remotes/svn/tags/*
           branches = branches/{Project1-v0.4.2-fixes,Project1-v0.4.1-fixes,Project1-refactor,Project1-feature1}/Project1:refs/remotes/svn/*
           branches = branches/{Project1-feature2}:refs/remotes/svn/*
    [svn]
           authorsfile = ../authors.txt
    

    Notice how each branches and tags line has a list of directory names in {}, even if it only contains one directory name. Without this, the fetching won't work.

    Download the SVN data

    To download and convert the SVN repository run:

    git svn fetch
    

    Postprocessing

    After this, some post-processing is required. To convert the remove tags and branches to proper local tags and branches and delete the remote ones run:

    for branch in `git branch -r |grep -v tags| grep -v trunk | sed 's/svn\///'`; do
         git branch $branch remotes/svn/$branch;
    done
    for tag in `git branch -r |grep tags| sed 's;svn/tags/;;'`; do
          git tag $tag remotes/svn/tags/$tag;
    done
    for br in `git branch -r`; do
          git branch -d -r $br
    done
    

    Convert the svn:ignore properties to a .gitignore file

    git svn show-ignore > .gitignore
    git add .gitignore
    git commit -m "Added .gitignore file based on the svn:ignore properties"
    

    After inspecting the git repo with gitg or gitk it turned out that many merges were missing (not show in the graph), so I had to graft those by hand by adding the parent commit hashes to the .git/info/grafts file (the file format is merge_hash parent1_hash parent2_hash). Note that gitk shows the grafts, whereas gitg doesn't until they are made permanent.

    To make the commits permanent use

    git filter-branch --tag-name-filter cat -- --all
    

    and to remove the backups created by git filter-branch run:

    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    

    Wrapping up

    Now that everything is converted, clone the repository into a bare one:

    git clone --bare SVN2GitMigration Project1.git
    

    and push that to Github:

    cd Project1.git
    git push --mirror https://github.com/mygithubuser/Project1.git
    

    References

    Thanks to the following sites for pointing to the right directions: