Search code examples
emailsmtpimaprfcimaplib

Why are the same contents decoded differently according to mail clients?


My code checks a mailbox, and forwards every mail to another user.
But I found out that the same contents are decoded differently according to mail clients(I mean, when sent with [email protected], with [email protected], and etc).

For example: what I typed,
subject: subject
content: this is content

for mail client 1:
358 2020-04-22 18:12:23,249: run: DEBUG: subject has come as: =?utf-8?B?c3ViamVjdA==?=
359 2020-04-22 18:12:23,249: run: DEBUG: content has come as: dGhpcyBpcyBjb250ZW50Cg==

for mail client 2:
178 2020-04-22 18:12:09,636: run: DEBUG: subject has come as: =?utf-8?B?c3ViamVjdA==?=
179 2020-04-22 18:12:09,636: run: DEBUG: content has come as: dGhpcyBpcyBjb250ZW50Cg==

for mail client 3:
300 2020-04-22 18:12:16,494: run: DEBUG: subject has come as: subject
301 2020-04-22 18:12:16,494: run: DEBUG: content has come as: this is content

For 1 and 2, they are the same.
But for 3, it is different.

My code using imaplib sample:

typ, rfc = self.mail.fetch(num, '(RFC822)')
raw_email = rfc[0][1]
raw_email_to_utf8 = raw_email.decode('utf-8')
msg=email.message_from_string(raw_email_to_utf8)
content = msg.get_payload() #This is printed for the above debugging log.

Because of this, some mails are sent with wierd contents.(subjects are encoded well again)

Why this difference, and how can I get the contents for differently decoded ones?


Solution

  • Something is doing unnecessary encoding. That's unnecessary, but not prohibited.

    RFC2047 encoding is necessary sometimes, but legal always (because permitting it always was simpler then making precise rules). You have to detect RFC2047 encoding and decode it when present. If a word starts with =?, ends with ?= and contains precisely two question marks, then it is 2047-encoded. There are libraries or functions to decode for most or all languages, search for "rfc2047".