I have an .mbox
file that represents many messages at location mbox_fname
. In Python 3, I have already loaded each of the messages, which are objects of the class email.message.Message
.
I'd like to get access to the body content of the message.
For instance, something like:
import mailbox
the_mailbox = mailbox.mbox(mbox_fname)
for message in the_mailbox:
subject = message["subject"]
content = <???>
How do I access the body of the message?
I made some progress modifying this answer. This is the best I have so far:
import email
def get_body(message: email.message.Message, encoding: str = "utf-8") -> str:
body_in_bytes = ""
if message.is_multipart():
for part in message.walk():
ctype = part.get_content_type()
cdispo = str(part.get("Content-Disposition"))
# skip any text/plain (txt) attachments
if ctype == "text/plain" and "attachment" not in cdispo:
body_in_bytes = part.get_payload(decode=True) # decode
break
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
body_in_bytes = message.get_payload(decode=True)
body = body_in_bytes.decode(encoding)
return body
So modifying the code in the original question, this gets called like the following:
for message in the_mailbox:
content = get_body(message)