Search code examples
pythonemailcharacter-encoding

Problem with a mail message created by a parser


If I create a message this way (using real addresses, of course):

msg = email.message.EmailMessage()
msg['From'] = "[email protected]"  
msg['To'] = "[email protected]" 
msg['Subject'] = "Ayons asperges pour le déjeuner"
msg.set_content("Cela ressemble à un excellent recipie déjeuner.")

I can successfully send it using smtplib. No problem with the Unicode characters in the body. The received message has these headers:

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

If I try to create the same message in this alternative way:

msgsource = """\
From: [email protected]
To: [email protected]
Subject: Ayons asperges pour le déjeuner

Cela ressemble à un excellent recipie déjeuner.
"""

msg = email.parser.Parser(policy=email.policy.default).parsestr(msgsource)

I can't send it. send_message() from smtplib fails with

UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 15: ordinal not in range(128)

and obviously expects ascii, not Unicode. What causes the difference and how to fix it properly?

(code is based on these examples)


Solution

  • The error can be avoided by encoding msgsource and then parsing the resulting bytes:

    msgsource = msgsource.encode('utf-8')
    msg = email.message_from_bytes(msgsource, policy=policy.default)
    print(msg)
    

    outputs

    From: [email protected]
    To: [email protected]
    Subject: Ayons asperges pour le =?unknown-8bit?q?d=C3=A9jeuner?=
    
    Cela ressemble �� un excellent recipie d��jeuner.
    

    sending it to Python's SMTP DebuggingServer produces

    b'From: [email protected]'
    b'To: [email protected]'
    b'Subject: Ayons asperges pour le d\xc3\xa9jeuner'
    b'X-Peer: ::1'
    b''
    b'Cela ressemble \xc3\xa0 un excellent recipie d\xc3\xa9jeuner.'
    

    Note that no encoding headers are written: I'm guessing that the parsers attempt to reproduce the message from the source string or bytes as faithfully as possible, making as few additional assumptions as possible. The Parser docs

    [Parser is] an API that can be used to parse a message when the complete contents of the message are available in a [string/bytes/file]

    seem to me to support this interpretation.