Search code examples
pythonunicodeencodingutf-8irc

Python unicode issues (2.6)


I'm currently working on a irc bot for a multi-lingual channel, and I'm encountering some issues with unicode which are proving nearly impossible to solve.

No matter what configuration of unicode encoding I seem to try, the list function which the below code sits within just flat out does nothing (c.notice is a class function which sends a NOTICE command to the irc server) or when it does do something, spits out something which obviously isn't encoded.

The command should be sending 天子, but instead it seems hellbent on sending å¤©å­ with a previous configuration of the same commands. The one I have specified below is of the 'send nothing' variety. I haven't worked with unicode before this, and thus I am quite stuck. I'm also positive that I'm doing this completely wrong as a consequence.

(compileCMD just takes a list and spits out a single string of all the elements within the list)

uk = self.compileCMD(self.faq.keys(),0)
ukeys = unicode(uk,"utf-8").encode("utf-8")
c.notice(nick, u"Current list of faq entries: %s" % (uk))

Solution

  • A few points:

    • The bytes "天å­" are the UTF-8 encoding of "天子", so are you sure it's wrong that this is sent? Does the program/... that should process the data use UTF-8, or does it just interpret the input as a different encoding like Latin-1?
    • unicode(uk,"utf-8").encode("utf-8"): Decoding UTF-8 and then reencoding as UTF-8 doesn't change anything.
    • ukeys = unicode(uk,"utf-8").encode("utf-8"): The ukeys variable that contains the reencoded data is not used later on.