Search code examples
windowssvnencodingutf-8cp1251

Convert Subversion commit messages to Unicode


Currently I have a local Subversion repository with a lot of commit messages in cp1251 encoding.

Is there any way I can convert all commit messages into utf-8 encoding?


Solution

  • Your commit messasges are already stored as UTF-8:

    Subversion internally handles certain bits of data—for example, property names, pathnames, and log messages—as UTF-8-encoded Unicode. This is not to say that all your interactions with Subversion must involve UTF-8, though. As a general rule, Subversion clients will gracefully and transparently handle conversions between UTF-8 and the encoding system in use on your computer, if such a conversion can meaningfully be done (which is the case for most common encodings in use today).

    If you've somehow double-encoded them, though, then assuming you're using an FSFS-style repository the easiest way will probably be to work through all the revprop files that you find in db/revprops/*/* underneath your repository and re-write them with the correct encoding, e.g. using the iconv command-line tool from GnuWin32. (Note that these files should have Unix line endings i.e. LF not CRLF).