Search code examples
pythonemailtkinterimapimaplib

Trying to convert html to text in python?


I'm writing an email application in python. Currently when I try and display any emails using html it just displays the html text. Is there a simple way to convert an email string to just plain text to be viewed?

The relevant part of my code:

rsp, data = self.s.uid('fetch', msg_id, '(BODY.PEEK[HEADER])')
raw_header = data[0][1].decode('utf-8')
rsp, data = self.s.uid('fetch', msg_id, '(BODY.PEEK[TEXT])')
raw_body = data[0][1].decode('utf-8')

header_ = email.message_from_string(raw_header)
body_ = email.message_from_string(raw_body)
self.message_box.insert(END, header_)
self.message_box.insert(END, body_)

Where the message box is just a tkinter text widget to display the email

Thanks


Solution

  • Most emails contain both an html version and a plain/text version. For those emails you can just take the plain/text bit. For emails that only have an html version you have to use an html parser like BeautifulSoup to get the text.

    Something like this:

    message = email.message_from_string(raw_body)
    
    plain_text_body = ''
    if message.is_multipart():
        for part in message.walk():       
            if part.get_content_type() == "text/plain":
                plain_text_body = part.get_payload(decode=True)
                break
    
    if plain_text_body == '':
        plain_text_body = BeautifulSoup(message.as_string()).get_text()
    

    Note: I have not actually tested my code, so it probably won't work as is.