Search code examples
pythonemailmailbox

Extract body from email message objects in Python


I have an .mbox file that represents many messages at location mbox_fname. In Python 3, I have already loaded each of the messages, which are objects of the class email.message.Message.

I'd like to get access to the body content of the message.

For instance, something like:

import mailbox
the_mailbox = mailbox.mbox(mbox_fname)

for message in the_mailbox:
    subject = message["subject"] 
    content = <???>

How do I access the body of the message?


Solution

  • I made some progress modifying this answer. This is the best I have so far:

    import email
    def get_body(message: email.message.Message, encoding: str = "utf-8") -> str:
        body_in_bytes = ""
        if message.is_multipart():
            for part in message.walk():
                ctype = part.get_content_type()
                cdispo = str(part.get("Content-Disposition"))
    
                # skip any text/plain (txt) attachments
                if ctype == "text/plain" and "attachment" not in cdispo:
                    body_in_bytes = part.get_payload(decode=True)  # decode
                    break
        # not multipart - i.e. plain text, no attachments, keeping fingers crossed
        else:
            body_in_bytes = message.get_payload(decode=True)
    
        body = body_in_bytes.decode(encoding)
    
        return body
    

    So modifying the code in the original question, this gets called like the following:

    for message in the_mailbox:
        content = get_body(message)