Search code examples

Python - convert unicode and hex to unicode

I have a supposedly unicode string like this:


How do I get the correct unicode string out of this? I think, the actual unicode value is ラブライブ!スクールアイドルフェスティバル(スクフェス)


  • You have a Mojibake, an incorrectly decoded piece text.

    You can use the ftfy library to un-do the damage:

    >>> from ftfy import fix_text
    >>> fix_text(s)
    >>> print fix_text(s)

    According to ftfy, your data was encoded as UTF-8, then decoded as Windows codepage 1252; the ftfy.fixes.fix_one_step_and_explain() function shows the repair steps needed:

    >>> ftfy.fixes.fix_one_step_and_explain(s)[-1]
    [(u'encode', u'sloppy-windows-1252', 0), (u'decode', u'utf-8', 0)]

    (the 'sloppy' encoding is needed because not all UTF-8 bytes can be decoded as cp1252, but some bad decoders then just copy the original byte; the special codec reverses that process).

    In fact, in your case this was done twice, not a feat I had seen before:

    >>> print s.encode('sloppy-cp1252').decode('utf8').encode('sloppy-cp1252').decode('utf8')