Making it simple and straightforward, I have such raw string, which is a filneme with Chinese characters:
=?utf-8?B?5L+d56iO5LuT5Y+R6LSn5pel5oqlMS4xOS0xLjIxLnhsc3g=?=
According to http://dogmamix.com/MimeHeadersDecoder/, the decoed version of it looks like following:
保税仓发货日报1.19-1.21.xlsx (which is right)
I am trying to decode this to get the following unicode string:
u'保税仓发货日报1.19-1.21.xlsx'
What am i doing is:
Step 1:
in_str = '=?utf-8?B?5L+d56iO5LuT5Y+R6LSn5pel5oqlMS4xOS0xLjIxLnhsc3g=?='
from email.header import decode_header
res = decode_header(in_str)
Then res is a list of tuples of following form:
[('\xe4\xbf\x9d\xe7\xa8\x8e\xe4\xbb\x93\xe5\x8f\x91\xe8\xb4\xa7\xe6\x97\xa5\xe6\x8a\xa51.19-1.21.xlsx', 'utf-8')]
What yields a question - why res[0][0] it's partialy a bytestring, and partially a normal raw string ('1.19-1.21.xlsx' is a raw part of string)? But let's carry on.
Step 2.
Let's decode this bytestring from utf-8, as I believe it is utf-8 encoded string (logical, right?)
filename = res[0][0].decode('utf-8')
I believe this should return a following unicode string:
u'保税仓发货日报1.19-1.21.xlsx'
But i get another bytestring instead (this time unicode):
u'\u4fdd\u7a0e\u4ed3\u53d1\u8d27\u65e5\u62a51.19-1.21.xlsx'
Which drives me nuts, as I believe I am doing stuff right.
BTW, yes I have read "Unicode HOWTO", still no idea how to fix it.
Continuing your example and using an IDE that supports the font characters:
#!python2
in_str = '=?utf-8?B?5L+d56iO5LuT5Y+R6LSn5pel5oqlMS4xOS0xLjIxLnhsc3g=?='
from email.header import decode_header
res = decode_header(in_str)
for data,enc in res:
print data.decode(enc)
Output:
保税仓发货日报1.19-1.21.xlsx
In Python 2, you have to decode and print the strings to display properly.