Search code examples
pythonstringpython-2.7unicodeutf8-decode

Convert Python's internal str to print equivalent


Currently I have:

 >> class_name = 'AEROSPC\xc2\xa01A'
 >> print(class)
 >> AEROSPC 1A
 >> 'AEROSPC 1A' == class_name
 >> False

How can I convert class_name into 'AEROSPC 1A'? Thanks!


Solution

  • Convert to Unicode

    You get interesting errors when converting that, I first converted to utf8:

    my_utf8 = 'AEROSPC\xc2\xa01A'.decode('utf8', 'ignore')
    my_utf8
    

    returns:

    u'AEROSPC\xa01A'
    

    and then I normalize the string, the \xa0 is a non-breaking space.

    import unicodedata
    
    my_normed_utf8 = unicodedata.normalize('NFKC', my_utf8)
    print my_normed_utf8
    

    prints:

    AEROSPC 1A
    

    Convert back to String

    which I can then convert back to an ASCII string:

    my_str = str(my_normed_utf8)
    print my_str
    

    prints:

    AEROSPC 1A