Search code examples
pythonemailencoding

Trouble with encoding in emails


I have a little python script that pulls emails from a POP mail address and dumps them into a file (one file one email)

Then a PHP script runs through the files and displays them.

I am having an issue with ISO-8859-1 (Latin-1) encoded email

Here's an example of the text i get: =?iso-8859-1?Q?G=EDsli_Karlsson?= and Sj=E1um hva=F0 =F3li er kl=E1r J

The way i pull emails is this code.

pop = poplib.POP3(server)

mail_list = pop.list()[1]

for m in mail_list:
    mno, size = m.split()
    lines = pop.retr(mno)[1]

    file = StringIO.StringIO("\r\n".join(lines))
    msg = rfc822.Message(file)

    body = file.readlines()

    f = open(str(random.randint(1,100)) + ".email", "w")
    f.write(msg["From"] + "\n")
    f.write(msg["Subject"] + "\n")
    f.write(msg["Date"] + "\n")

    for b in body:
        f.write(b)

I have tried probably all combinations of encode / decode within python and php.


Solution

  • You can use the python email library (python 2.5+) to avoid these problems:

    import email
    import poplib
    import random
    from cStringIO import StringIO
    from email.generator import Generator
    
    pop = poplib.POP3(server)
    
    mail_count = len(pop.list()[1])
    
    for message_num in xrange(mail_count):
        message = "\r\n".join(pop.retr(message_num)[1])
        message = email.message_from_string(message)
    
        out_file = StringIO()
        message_gen = Generator(out_file, mangle_from_=False, maxheaderlen=60)
        message_gen.flatten(message)
        message_text = out_file.getvalue()
    
        filename = "%s.email" % random.randint(1,100)
        email_file = open(filename, "w")
        email_file.write(message_text)
        email_file.close()
    

    This code will get all the messages from your server and turn them into Python message objects then flatten them out into strings again for writing to the file. By using the email package from the Python standard library MIME encoding and decoding issues should be handled for you.

    DISCLAIMER: I have not tested that code, but it should work just fine.