Currently I am working on an email parser that simply uses imaplib to establish a connection to my gmail account and read new emails.
This all works fine and the results are as expected when running my script using python2 (i.e "python myScript.py")
For example, if I had an email that looked like this:
To: receiver@qwerty.com
From: sender@asdf.com
Subject: Test Subject
Test1
Test2
My script will output as expected:
To: receiver@qwerty.com
From: sender@asdf.com
Subject: Test Subject
Body: Test1
Test2
However, when running this using python3 (i.e "python3 myScript.py"), the results are not the same and escaped characters are shown in the message body, see below:
To: receiver@qwerty.com
From: sender@asdf.com
Subject: Test Subject
Body: b'Test1\r\nTest2\r\n'
And below is the code used for this process:
def readMailbox(mail):
res, data = mail.uid('search', None, 'UNSEEN')
i = len(data[0].split())
for x in range(i):
latestEmailUID = data[0].split()[x]
result, emailData = mail.uid('fetch', latestEmailUID, '(RFC822)')
emailMessage = email.message_from_string(emailData[0][1].decode('utf-8'))
emailFrom = str(email.header.make_header(email.header.decode_header(emailMessage['From'])))
emailTo = str(email.header.make_header(email.header.decode_header(emailMessage['To'])))
subject = str(email.header.make_header(email.header.decode_header(emailMessage['Subject'])))
# Body details
for part in emailMessage.walk():
if part.get_content_type() == 'text/plain':
body = part.get_payload(decode=True)
print('To: %s' % emailTo)
print('From: %s' % emailFrom)
print('Subject: %s' % subject)
print('Body: %s' % body)
I am going to need to be able to capture the string body without showing escaped characters for use later on. Can anyone explain to me please why this is happening, or what I need to be doing differently in python3 so that I can parse the bodies of the email normally?
Thank you for your time, any guidance in the right direction would be greatly appreciated!
The 'b' in front of the string means it is still a bytes array and has not yet been decoded.
Decoded with 'utf-8' like the other strings, it should print the special characters as one would expect
...
body = part.get_payload(decode=True)
body = body.decode('utf-8')
print('Body: %s' % body)
...