Search code examples
pythonpython-2.7unicodeencodingunicode-literals

How to create an unicode instance from an unicode literal


Due to some bug in a C extension, I'm getting unicode data with str instances, or in order words, str with no encoding at all and an unicode literal.

So, for instance, this is a valid unicode literal

>>> u'\xa1Se educado!'

And the UTF-8 encoded str would be:

>>> '\xc2\xa1Se educado!'

However, I get an str with the unicode literal:

>>> '\xa1Se educado!'

And I need to create an unicode instance from that. Using unicode() doesn't work, since it expects an encoding. I figured that ''.join(unichr(ord(x)) for x in s) does what I need, but it's really ugly. There has to be a better solution. Any ideas?


Solution

  • As I suspected, there has to be a way to decode it with whatever "encoding" python uses for unicode, and that's raw_unicode_escape.

    >>> unicode('\xa1Se educado!', 'raw_unicode_escape')
    u'\xa1Se educado!'