I tried to get the text of a received gmail, using the email and imaplib modules in python. After decoding with utf-8 and after getting the payload of the message, all the spaces are still replaced by =20. Can I use another decoding step in order to fix this?
The code is the following: (I got it from a youtube tutorial - https://youtu.be/Jt8LizzxkPU )
``
import email
import imaplib
username = "abc"
password = "123"
mail = imaplib.IMAP4_SSL("imap.gmail.com")
mail.login(username,password)
mail.select("inbox")
result, data = mail.uid("search", None,"ALL")
inbox_item_list = data[0].split()
for item in inbox_item_list:
#most_recent = inbox_item_list[-1]
#oldest = inbox_item_list[0]
result2, email_data = mail.uid('fetch',item,'(RFC822)')
raw_email = email_data[0][1].decode("utf-8")
email_message = email.message_from_string(raw_email)
to_ = email_message['To']
from_ = email_message['From']
subject_ = email_message['Subject']
counter = 1
for part in email_message.walk():
if part.get_content_maintype() == "multipart":
continue
filename = part.get_filename()
if not filename:
ext = ".html"
filename = "msg-part-%08d%s" %(counter, ext)
counter += 1
#save file
content_type = part.get_content_type()
print(subject_)
print (content_type)
if "plain" in content_type:
print(part.get_payload())
elif "html" in content_type:
print("do some beautiful soup")
else:
print(content_type)
``
Try to import quopri
, and then when you get the content of the email body (or whatever text that has the =20s
inside), you can use quopri.decodestring()
I do it like this
quopri.decodestring(part.get_payload())
But do keep in mind that this is if you quite specifically want to decode from quoted-printable
. Normally I would say the answer of @jfs is neater.