Search code examples
pythonpython-3.xemailimaplib

imaplib wrongly adding `=` and `3D` characters to url text from email


Here I'm using imapblib and email to retrieve a certain email message based on certain criteria (i.e. who it's from and the subject).

import imaplib
import email

FROM_EMAIL  = "##########@gmail.com"
FROM_PWD    = "##########"
SMTP_SERVER = "imap.gmail.com"

mail = imaplib.IMAP4_SSL(SMTP_SERVER)
mail.login(FROM_EMAIL,FROM_PWD)
mail.select("INBOX")
result, data2 = mail.search(None,'(FROM "####" SUBJECT "####")')
ids = data2[0]
id_list = ids.split()
latest_email_id = id_list[-1]
result, email_data = mail.fetch(latest_email_id, "(RFC822)")

raw_email = email_data[0][1]
raw_email_string = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_string)

In the email, using gmail on my desktop, there is a link that appears like this (please note, the # character represents sensitive information):

# This is how the link is supposed to appear
https://inreach.garmin.com/textmessage/txtmsg?extId=e3e7d4c2-fab4-43ad-93de-f9dedca8280#####=##########%40gmail.com

When printing the email text as python has retrieved it, I get this bad link:

email_text = list(email_message.walk())[1].get_payload()
print(email_text) # Note(I am not printing the whole email for privacy reasons)

# The link as python has retrieved it appears like this:

https://inreach.garmin.com/textmessage/txtmsg?extId=3De3e7d4c2-fab4-43ad-93=de-f9dedca8280#####=3D##########%40gmail.com

Python is somehow adding an = character between 93 and de and it is also adding several 3D characters.

What is python doing? Ideas on how to fix this?


Solution

  • What you're seeing is quoted-printable encoding. It's one way of encoding arbitrary bytes into ASCII text for transmission through e.g. email. Among the consequences of this encoding are the following:

    • Each '=' character in your message gets encoded into '=3D' (because 0x3d is the character code for '=').
    • Lines are wrapped at 76 characters by inserting the sequence '=\n' (basically an escaped newline which the decoder will strip out). I bet one of these newlines got inserted in the middle of your link.

    You can convert the encoded text back to the original bytes using the quopri module from the standard library. Some parts of Python's email handling library may also do this for you.