Search code examples
pythonunicodeencodingtwisted

how to: Twisted privmsg to accept non-ascii strings


I have an IRC bot written in python that uses Twisted.

It can print non-ascii strings without a problem with self.msg(channel, str.encode('utf-8').

However, I get exceptions when a non-ascii string is being received with privmsg:

def privmsg(self, user, channel, msg):
    msg = msg.encode('utf-8')
    user = user.split('!', 1)[0]
    [... code goes here...]

I get the following exception:

 File "/usr/lib64/python2.4/site-packages/twisted/words/protocols/irc.py", line 1498, in handleCommand
  method(prefix, params)
File "/usr/lib64/python2.4/site-packages/twisted/words/protocols/irc.py", line 1043, in irc_PRIVMSG
  self.privmsg(user, channel, message)
File "./IlyBot.py", line 58, in privmsg
  msg = msg.encode('utf-8')
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 4: ordinal not in range(128)

Does anyone know how to force the encoding to be UTF-8 on the msg received by privmsg?


Solution

  • I think you want "decode", not "encode". The argument to privmsg is a byte string (str, in python 2.x), so if you want it to be text you have to de-code those bytes.

    You can't force the encoding to be UTF-8, because the encoding is whatever you happened to receive from the server. Thanks to IRC's complete lack of character set support, that's the best you can do.