Due to some bug in a C extension, I'm getting unicode data with str instances, or in order words, str with no encoding at all and an unicode literal.
So, for instance, this is a valid unicode literal
>>> u'\xa1Se educado!'
And the UTF-8 encoded str would be:
>>> '\xc2\xa1Se educado!'
However, I get an str with the unicode literal:
>>> '\xa1Se educado!'
And I need to create an unicode instance from that. Using unicode()
doesn't work, since it expects an encoding. I figured that ''.join(unichr(ord(x)) for x in s)
does what I need, but it's really ugly. There has to be a better solution. Any ideas?
As I suspected, there has to be a way to decode it with whatever "encoding" python uses for unicode, and that's raw_unicode_escape
.
>>> unicode('\xa1Se educado!', 'raw_unicode_escape')
u'\xa1Se educado!'