How do I turn this:
With Best Regards, JS Chen*\r\n\r\n=E9=A0=8E=E9=82=A6=E7=A7=91=E6=8A=80=E8=82=A1=E4=BB=BD=E6=9C=89=E9=99=90=E5=\r\n=85=AC=E5=8F=B8/Chipbond Technology
to this:
With Best Regards, JS Chen*\r\n\r\n頎邦科技股份有限公司/Chipbond Technology
using python?
I'm pulling mixed language email data using imaplib and it's giving me this hex code with equal signs in-between whenever there are other language characters
Here is my code:
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('****@gmail.com', '*********')
mail.select('Inbox')
type, data = mail.search(None,'(SUBJECT "ULVAC RSH-820")')
mail_ids = data[0]
id_list = mail_ids.split()
print('searching...')
for num in data[0].split():
typ, data = mail.fetch(num, '(RFC822)' )
raw_email = data[0][1]
raw_email_string = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_string)
print('decoding..')
for response_part in data:
if isinstance(response_part, tuple):
msg = email.message_from_string(response_part[1].decode('utf-8'))
if msg.is_multipart():
print('de-partitioning...')
for part in msg.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
if ctype == 'text/plain' and 'attachment' not in cdispo:
body = part.get_payload()
else:
body = msg.get_payload()
Your emails have a content transfer encoding, specifically, the Quoted-Printable encoding, which is used to make sure the email data stream is ASCII safe.
Simply tell Python to decode the payload by passing in decode=True
to the Message.get_payload()
method:
body_data = part.get_payload(decode=True)
charset = part.get_param("charset", "ASCII")
body = body_data.decode(charset, errors="replace")
However, this does mean you'll be given binary data, even for text content types and so must explicitly decode the data. get_payload()
is not that helpful here. It is also part of the legacy API; you want to switch to the newer Unicode-friendly API. Do so by using a policy other than the compat32
policy (the default) when loading a message:
from email import policy
# ...
raw_email = data[0][1]
# you may have to use policy.default instead, depending on the line endings
# convention used.
email_message = email.message_from_bytes(raw_email, policy=policy.SMTP)
and further down
msg = email.message_from_bytes(response_part[1], policy=policy.SMTP)
Note that I don't decode the bytes
value first, by using email.message_from_bytes()
instead of email.message_from_string()
you delegate decoding the data to the email
parser.
Now email_message
is a email.message.EmailMessage
instance instead of the older email.message.Message()
type, and you can use the EmailMessage.get_content()
method, which for text mime types will return a Unicode text string:
body = part.get_content()