I have this code but I don't actually get the email text.
Have I got to decode the email text?
import sys
import imaplib
import getpass
import email
import email.header
from email.header import decode_header
import base64
def read(username, password, sender_of_interest):
# Login to INBOX
imap = imaplib.IMAP4_SSL("imap.mail.com", 993)
imap.login(username, password)
imap.select('INBOX')
# Use search(), not status()
# Print all unread messages from a certain sender of interest
if sender_of_interest:
status, response = imap.uid('search', None, 'UNSEEN', 'FROM {0}'.format(sender_of_interest))
else:
status, response = imap.uid('search', None, 'UNSEEN')
if status == 'OK':
unread_msg_nums = response[0].split()
else:
unread_msg_nums = []
data_list = []
for e_id in unread_msg_nums:
data_dict = {}
e_id = e_id.decode('utf-8')
_, response = imap.uid('fetch', e_id, '(RFC822)')
html = response[0][1].decode('utf-8')
email_message = email.message_from_string(html)
data_dict['mail_to'] = email_message['To']
data_dict['mail_subject'] = email_message['Subject']
data_dict['mail_from'] = email.utils.parseaddr(email_message['From'])
#data_dict['body'] = email_message.get_payload()[0].get_payload()
data_dict['body'] = email_message.get_payload()
data_list.append(data_dict)
print(data_list)
# Mark them as seen
#for e_id in unread_msg_nums:
#imap.store(e_id, '+FLAGS', '\Seen')
imap.logout()
return data_dict
So I do this:
print('Getting the email text bodiies ... ')
emailData = read(usermail, pw, sender_of_interest)
print('Got the data!')
for key in emailData.keys():
print(key, emailData[key])
The output is:
mail_to [email protected]
mail_subject Get json file
mail_from ('Pedro Rodriguez', '[email protected]')
body [<email.message.Message object at 0x7f7d9f928df0>, <email.message.Message object at 0x7f7d9f928f70>]
How to actually get the email text?
Depending on what exactly you mean by "the text", you probably want the get_body
method. But you are thoroughly mangling the email before you get to that point. What you receive from the server isn't "HTML" and converting it to a string to then call message_from_string
on it is roundabout and error-prone. What you get are bytes; use the message_from_bytes
method directly. (This avoids all kinds of problems when the bytes are not UTF-8; the message_from_string
method only really made sense back in Python 2, which didn't have explicit bytes
.)
from email.policy import default
...
_, response = imap.uid(
'fetch', e_id, '(RFC822)')
email_message = email.message_from_bytes(
response[0][1], policy=default)
body = email_message.get_body(
('html', 'text')).get_content()
The use of a policy
selects the (no longer very) new EmailMessage
; you need Python 3.3+ for this to be available. The older legacy email.Message
class did not have this method, but should be avoided in new code for many other reasons as well.
This could fail for multipart messages with nontrivial nested structures; the get_body
method without arguments can return a multipart/alternative
message part and then you have to take it from there. You haven't specified what your messages are expected to look like so I won't delve further into that.
More fundamentally, you probably need a more nuanced picture of how modern email messages are structured. See What are the "parts" in a multipart email?