Search code examples
gitlaunchpad

git - how to apply to a transform to all past commit messages?


I'm trying to import a project to launchpad, but this fails with the error:

  File "/srv/importd.launchpad.net/production/launchpad-rev-17114/bzrplugins/git/fetch.py", line 119, in import_git_blob
    ie = cls(file_id, name.decode("utf-8"), parent_id)
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 0: invalid continuation byte

So I'm wondering how can I search all commit logs for this "0xca" char and remove it? Looking manually at the history with QGit doesn't show any out of place chars.


Solution

  • You may be able use git log --grep... to find bad commits and fix them manually (not sure if unicode will work with --grep). If there are too many to fix manually, or you want to automate the process, consider using git filter-branch --msg-filter <command> to modify every commit message by executing the shell command <command>. Here is a simple example of using --msg-filter:

       mkdir tmp
       cd tmp
       git init
       touch a
       git add .
       git ci -am 'first commit'
       touch b
       git add .
       git ci -am 'second commit'
       git log --oneline
       git filter-branch --msg-filter 'sed "s/^/hello /"'
       git log --oneline
    

    See this question for possibly how to fill in <command> with a command that will suitably alter the unicode in your commit messages.

    You can then inspect the changes by running

    diff <(git log original/refs/heads/<your-branch> --oneline | cut -d' ' -f 2-) <(git log <your-branch> --oneline | cut -d' ' -f 2-)