I have a word-encoded string from received mail. When parsing encoded word in Python3, I got an exception
'gb2312' codec can't decode bytes in position 18-19: illegal multibyte sequence
raised from make_header method.
from email.header import decode_header, make_header
hdr = decode_header("""=?gb2312?B?QSBWIM34IMXMILP2IMrbICAgqEMgs8kgyMsg?=""")
make_header(hdr)
Parsing encoded string in online tools works without problems (http://dogmamix.com/MimeHeadersDecoder/). Any suggestions what I am doing wrong? Thanks
The error message tells you that the bytes in position 18-19 are not valid for this encoding.
decode_header
simply extracts a bunch of bytes and an encoding. make_header
actually attempts to interpret those bytes in that encoding, and fails, because these bytes are not valid in that encoding.
Similarly,
bash$ base64 -D <<<'QSBWIM34IMXMILP2IMrbICAgqEMgs8kgyMsg' |
> iconv -f gb2312 -t utf-8
A V 网 盘 出 售
iconv: (stdin):1:18: cannot convert
So the error message simply tells you that this data is not valid. We cannot tell without more information what the data should be, and neither can Python or your program do that.
For a rough parable, you can g??ss which b?t?s are m?ss?ng here, but not in ?h?? l?ng?? s???e???.