Search code examples
svnsvndumpsvndumpfilter

SVN: Minimize the dump needed to move project into its own repo


My original question was below. I've tried a few things since to see if I could get this to work.

I have a tiny shell script that looks like this:

svnadmin dump -r108917 ./repo \
    | svndumpfilter include /KeyManagement \
          --drop-empty-revs \
          --skip-missing-merge-sources \ 
          --renumber-revs > km.svndump \

while read rev
do
    svnadmin dump -r$rev --incremental ./repo \
        | svndumpfilter include /KeyManagement \
             --drop-empty-revs \
             --skip-missing-merge-sources \
             --renumber-revs >> km.svndump
done << km.revs.txt

km.revs.txt is a text file that simply contains the revisions that contained changes to the /KeyManagment project.

When I first did this, I'd thought I do the filtering afterwards. However, in the very first revision dumped, km.svndump grew to over 68 gigabytes in size. Whoops. In the second attempt, I am filtering the project via svndumpfilter.

This ran for quite a while (I nohup'd this and simply checked it from time to time). When I was finished, I got km.svndump that showed the UUID, the first revision, and an out of memory error. Apparently, my script didn't get passed the first revision to be dumped.

Any ideas how to continue?


We have a repository with a special project that's really incompatible with the rest of the repository. The entire repository can be seen by any user in the LDAP group Development. However, one project contains information that we only want people working on that project to see. (Our KeyManagement project). The Repo layout looks like this:

  • /trunk - Trunk for the rest of the repo
  • /branches - Branches for the rest of the repo
  • /tags - Tags for the rest of the repo
  • /KeyManagement - Special KeyManagement project.

To keep out prying eyes, we use a svn_acces file to specify the users who can see this. This has caused a lot of issues in maintenance, and I would simply like to make KeyManagement a separate repository with its own LDAP access group. (We already have multiple repos with their own LDAP group).

The problem is that we have over 175,000 revisions in our repo, and only 124 of those revisions have to do with the KeyManagement project. Dumping out all 175,000 revisions takes about 30+ hours. If I could just dump out the revisions I need, I could do the entire dump in a couple of hours.

The other issue is this:

$ svn log -r108917:108918 -v $REPO
------------------------------------------------------------------------
r108917 | svnadmin | 2011-03-23 00:46:04 -0500 (Wed, 23 Mar 2011) | 1 line
Changed paths:
A /KeyManagement

New folder KeyManagement
------------------------------------------------------------------------
r108918 | svnadmin | 2011-03-23 00:47:18 -0500 (Wed, 23 Mar 2011) | 1 line
Changed paths:
A /KeyManagement/trunk (from /trunk/KeyManagement:108917)
D /trunk/KeyManagement

Move the KeyManagement
------------------------------------------------------------------------

Apparently, KeyManagement was once also under /trunk. My previous experience with dump and loads using svndumpfilter is that I have to dump and load both /KeyManagement and /trunk/KeyMangement at the same time. Truthfully, I don't care about /trunk/KeyManagement because the application was completely redone and no one cares about the code.

I understand that the first revision of a dump is a complete revision. Is it possible for me to do something like this:

$ svnadmin dump -r108917:108918 old_repo > dump_file
$ svnadmin dump -r108103 --incremental old_repo >> dump_file #Revision with KeyManagement
$ svnadmin dump -r107429 --incremental old_repo >> dump_file #Revision with KeyManagement
...

$ svnadmin load --parent-dir new_repo < dump_file

And just dump the revisions that have to do with KeyManagement. I don't care about the versions under /trunk. The project had been completely revised since then. I know the revisions, I can easily write a shell script to do this. None of the revisions that have to do with KeyManagement have any other projects entangled with them.

I just don't want to take 40+ hours to do this.


Solution

  • I finally did use svnrdump. I had to dump all the revisions, but it allowed me to specify I only wanted /KeyManagement. With svndump and svndumpfilter, I would have to specify /KeyManagement and /trunk/KeyManagment since that's where the original project was located.

    Unfortunately, you can't use svndumpfilter on svnrdump, and since all revisions were reported, I couldn't renumber them and leave out the empty ones.

    Still svnrdump did allow me to only capture one directory even though I couldn't change its location or skip over empty revisions and renumber.