Search code examples
gitsvngit-svnsourceforge

Cloning a large svn repository from sourceforge with "git svn clone"


I'm trying to clone a large subversion repository from sourceforge using git svn clone. The cloning process periodically gets stuck: SF just stops sending data. Eventually it times out, but it takes forever. After this, it's a cycle of:

  • Run git svn fetch
  • Wait for it to timeout
  • Repeat as necessary

Looking at the git svn man page, there doesn't appear to be an obvious way to set a timeout for i/o operations.

Is there any way to make this process more efficient?


Solution

  • Let's me give you some advice on speeding things up. I just finished running them myself so I'm sure they work. The main trick is to copy Subversion repository to the local filesystem and then run git svn.

    First, SourceForge allows to backup repositories using rsync (see the last command at the very bottom). So start with

    rsync -ahPv svn.code.sf.net::p/vice-emu/code vice-emu.svn
    

    In a general case where rsync access is not available you can use svnrdump to dump the repository:

    svnrdump dump --non-interactive https://svn.code.sf.net/p/vice-emu/code > vice-emu.svndump
    

    and then load the dump into a local repository:

    svnadmin create vice-emu.svn
    svnadmin load vice-emu.svn < vice-emu.svndump
    

    Either way you can use git svn on the local repository:

    git svn clone file://`pwd`/vice-emu.svn vice-emu
    

    And finally yet another trick: offload the import to Github.

    My results are: https://github.com/phdru/vice-emu-ghi (Github Importer) and https://github.com/phdru/vice-emu-gitsvn (rsync + git svn + postprocessing).

    PS. If you gonna fork one or both of my results please ping me afterward and I'll remove my temporary repositories. Else I'll remove then soon.