Search code examples
gitsvngit-svnreposurgeon

SVN to Git using a fast import/export stream


I have been working on converting an SVN repo of ~32,000 commits to any DVCS (Git, Bazaar, Mercurial, Plastic SCM). After a week or two I realized the best option is to convert the SVN repo to Git, get a fast-export stream, and import the .fe stream to whatever DVCS, as they all support the git fast export/import method.

I've tried everything on the internet: both on Windows 7 and Linux Ubuntu. Due to the size of the repo, I've had most success using reposurgeon and git-svn. But again, due to the size, both tools fail to covert the full repo in one go. I also tried SubGit, and although it works, it is extremely slow (~24h to process 1060 commits).

So I figured I could convert each folder within the repo (trunk, branches, tags, custom folders) separately and combine later on in Git. Then I realized this would not be possible as git's repo structure is significantly different to SVN.

My question is, is it possible to use my method above and with some magic, combine the separate conversions into one Git repo?

Essentially I need to get a fast export/import stream for my SVN repo to convert it to another DVCS, and figured a Git middle-step would be easiest. What, if any, other options are available for a successful conversion?

Thanks in advance.


Solution

  • Converting folders separately and combining the git repositories should work in principle, but would be very tricky to get right, so I'd advise against it.

    At any rate, 32,000 commits is not that much, and git-svn should be able to handle it, though it might take a day or so. However, if it is too slow, you'll have to experiment a bit.

    Things that can slow down git-svn's clone operation

    SVN repository speed

    First, of course, is the SVN repository speed. Try creating a local mirror of the SVN repository (using svnadmin dump/load or svnsync), and clone that.

    "Subdirectory" branches/tags

    Branches or tags (which git treats identically) can become a problem. Whenever git-svn clone encounters an SVN branch that is not a copy of trunk, but of a subdirectory, it will re-read the whole SVN history of the branched subdirectory since its creation (you can see this in the output of git svn clone, and here is an explanation by the author). This means that the speed of a clone is not only proportional to the number of SVN revisions n, but also to the number of "subdirectory branches" b, i.e. if b = 10, the clone may take up to 10 times longer.

    There is no easy solution to this problem. First, you could try cloning without tags - normally a tag just revers to an SVN revision ID, so having a list of tags should be enough (unless you have tags that contain changes... ugh). If that's not enough, maybe also skip some branches... though'd you'll have to decide if there are any you can do without.

    The extreme solution would be to use option --no-follow-parent. This will prevent git svn from re-reading a branch from the beginning. The branches will still be read, however, they will not be connected to the rest of the history. That still shows you what was done there, but makes them very difficult to merge back.


    Finally, note that you can interrupt and resume the clone process. To resume, run git svn fetch. You might need several restarts, but with a bit of patience the clone should go through.