Search code examples
gitutf-16

Is it possible to use something like reposurgeon to turn all commits of a UTF-16 file to UTF-8 in a git repo?


I have a git repository that has a UTF-16 file in it. Its only UTF-16 by accident, the file could be encoded in 7-bit ascii without a loss of data. I'd like to use something like reposurgeon to convert the file to UTF-8 so that git diff will work with older revisions of the file and I don't have to resort to git difftool. Is this possible?


Solution

  • Why don't you just covert the file to UTF-8 and commit it, e.g. with:

    iconv -f UTF-16 -t UTF-8 input-file.txt > input-file.txt.fixed
    # Check here that the conversion worked OK
    mv -i input-file.txt.fixed input-file.txt
    git commit -m 'Convert input-file.txt from UTF-16 to UTF-8' input-file.txt
    

    Update after a clarifying comment:

    If you want to rewrite that file at every commit in the history of HEAD, you can use git filter-branch, something like:

    git filter-branch --tree-filter \
        'iconv -f UTF-16 -t UTF-8 input-file.txt > input-file.txt.fixed  &&
         mv input-file.txt.fixed input-file.txt' HEAD
    

    Of course, if you're rewriting history in this way, it may cause problems if you have shared this repository with anyone. (I haven't tested that command - use it with care, probably only a new clone of your repository.)