I'm trying to search email body but facing some issues:
#!/usr/local/bin/python3
from email.message import EmailMessage
import email
import imaplib
import re
import sys
import logging
import base64
import os
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
###########log in to mailbox########################
user = 'email@company.com'
pwd = 'pwd'
conn = imaplib.IMAP4_SSL("outlook.office365.com")
conn.login(user,pwd)
conn.select("test")
count = conn.select("test")
resp, items = conn.uid("search" ,None, '(OR (FROM "some@email) (FROM "some@email"))')
items = items[0].split()
for emailid in items:
resp, data = conn.uid("fetch",emailid, "(RFC822)")
if resp == 'OK':
email_body = data[0][1]#.decode('utf-8')
mail = email.message_from_bytes(email_body)
#get all emails with words "PA1" or "PA2" in subject
if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:
print (mail)
I have issues in following line:
body = mail.get_body(preferencelist=('plain', 'html'))
getting:
AttributeError: 'Message' object has no attribute 'get_body'
You should not convert the MIME structure to a string and then feed that to message_from_string
. Instead, keep it as a bytes
object.
from email.policy import default as default_policy
...
items = items[0].split()
for emailid in items:
resp, data = conn.uid("fetch",emailid, "(RFC822)")
if resp == 'OK':
email_blob = data[0][1]
mail = email.message_from_bytes(email_blob, policy=default_policy)
if not any(x in mail['subject'] for x in ('PA1', 'PA2')):
continue
You are not showing how you are traversing the MIME structure so I sort of assume you are currently not doing that at all. Probably you want something like
# continuation for the above code
body = mail.get_body(preferencelist=('plain', 'html'))
for lines in body.split('\n'):
if line.startswith('MACHINE:'):
result = line[8:].strip()
break
It looks like you have an email body part encoded using Content-Transfer-Encoding: quoted-printable
. The above code is robust against various encodings because the email
library decodes the encapsulation transparently for you, which gets rid of any QP-escaped line breaks, like the one in your question. For the record, quoted-printable
can break up a long line anywhere, including in the middle of the value you are attempting to extract, so you really do want to decode before attempting to extract anything.