Search code examples
pythonemailcharacter-encodingbase64mime

Python's `email.message.as_string` encodes some parts as base64; unclear why


I wish to use Python's email module to change the encoding of MIME mail message parts from quoted-printable or base64 to 7bit or 8bit. All seems to work out, except that at the end, for some messages, email.message.as_string encodes some parts (text/plain and text/html both encountered) as base64. I do not understand why, and what to understand this behavior to avoid it.

The script code:

# Read and parse the message from stdin
msg = email.message_from_string(sys.stdin.read())

for part in msg.walk():
  if part.get_content_maintype() == 'text':
    if part['Content-Transfer-Encoding'] in {'quoted-printable', 'base64'}:
      payload = part.get_payload(decode=True)
      del part['Content-Transfer-Encoding']
      part.set_payload(payload)
      email.encoders.encode_7or8bit(part)

# Send the modified message to stdout
print(msg.as_string())

(If this matters: I use Python 3.3)


Solution

  • Use as_bytes instead. So change your print to:

    print(msg.as_bytes().decode(encoding='UTF-8'))

    reason is in policy documentation https://docs.python.org/3.4/library/email.policy.html#module-email.policy

    A cte_type value of 8bit only works with BytesGenerator, not Generator, because strings cannot contain binary data. If a Generator is operating under a policy that specifies cte_type=8bit, it will act as if cte_type is 7bit.

    And as_string use Generator, but as_bytes use BytesGenerator which you need