I've been struggling to work out the best way to copy a large svn repository to either a new svn or git repository, with all original revisions and meta data (time stamps, commit comments etc) except, i want to take the committer field in the log and hash it. This is because i will be analysing the log and need the committer names to be obscured but also identifiable from each other (ie. i can't just remove them or change them to "x").
I have attempted to do this several ways so far but i'm struggling to get it to work.
one way i considered was to loop through something like this
get current svn revision
print log
take all fields & use as input to git commit |
pass committer id through sha1sum first
git commit
revision = current revision -1
I have also looked at git-svn and realise i can create an authors file to rename all authors. But i don't know how to automatically return all authors from the svn log and hash them into the authors file. Manual entry of each author isn't a feasible solution in this case.
Can anyone advise me on how best to do this?
git svn
has two ways for mangling SVN author names: --authors-file
and --authors-prog
. The later allows you supply a script which will be called for each unknown author and you can return your hash. That script can calculate the hash, output the hash (see the linked docs for details of the expected response format) and store that mapping in an additional file.
That way you don't need to parse the authors from svn for yourself - git svn
will do that for you.