I retrieved some exif info from an image and got the following:
{ ...
37510: u'D2\nArbeitsamt\n\xc3\x84nderungsbescheid'
...}
I expected it to be
{ ...
37510: u'D2\nArbeitsamt\nÄnderungsbescheid'
... }
I need to convert the value to a str, but i couldn't manage it to work. I always get something like (using python27)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 14-15: ordinal not in range(128)
Any ideas how I can handle this?
UPDATE:
I tried it with python3 and there is now error thrown, but the result is now
{ ...
37510: 'D2\nArbeitsamt\nÃ\x84nderungsbescheid',
... }
which is still not the expected.
It seems to be utf8 which was incorrectly decoded as latin1 and then placed in a unicode string. You can use .encode('iso8859-1')
to reverse the incorrect decoding.
>>> my_dictionary = {37510: u'D2\nArbeitsamt\n\xc3\x84nderungsbescheid'}
>>> print(my_dictionary[37510].encode('iso8859-1'))
D2
Arbeitsamt
Änderungsbescheid
You can print it out just fine now, but you might then also decode it as unicode, so it ends up with the correct type for further processing:
>>> type(my_dictionary[37510].encode('iso8859-1'))
<type 'str'>
>>> print(my_dictionary[37510].encode('iso8859-1').decode('utf8'))
D2
Arbeitsamt
Änderungsbescheid
>>> type(my_dictionary[37510].encode('iso8859-1').decode('utf8'))
<type 'unicode'>