Search code examples
pythoncharacter-encodingimap

Python IMAP - decoding text with BASE64 encoding and KOI8-R charset


I have an IMAP email part which looks like this:

(b'TEXT', b'HTML', (b'CHARSET', b'KOI8-R'), None, None, b'BASE64', 3304, 42, None, None, None)

I am using IMAPClient to parse emails, and I am having trouble trying to decode the email body into human readable characters. My code looks like this:

bytes = imap_server.fetch(msgid, "BODY['1']")[msgid][b'BODY[1]']
rs = base64.b64decode(bytes)
rs = rs.decode('KOI8-R')

As a result, I get abracadabra like this:

ЪьЪЮJFIFHHЪАюExifMM*

The value of bytes variable is something like:

b'/9j/4AAQSkZJRgABAQEASABIAAD/4QTARXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEaAA...

Any ideas what am I doing wrong?

BTW, I have # -*- coding: utf-8 -*- at the beginning of the source code file.


Solution

  • You probably grabbed the wrong section, or the server misparsed the message. This is likely a JPEG image. It decodes to:

    \xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00H\x00H\x00\x00\xff\xe1\x04\xc0Exif...

    JFIF and Exif is indicative of a JPEG image.