Search code examples
pythonpython-3.ximaplib

Searching for UTF-8 encoded subjects with imaplib


I have some working code to fetch mail bodies and I want to filter the subject with a non-ascii string. Other forums suggest using the .uid class to do so, but the behavior is not logic to me.

Current code:

import imaplib
import email

username = secret
password = secret

imap = imaplib.IMAP4_SSL("imap.gmail.com")

status, messages = imap.select("INBOX",readonly=True)

res, msg = imap.search(None, 'HEADER Subject "string to be encoded with UTF-8"')

Suggested code:

import imaplib
import email

username = secret
password = secret

imap = imaplib.IMAP4_SSL("imap.gmail.com")

status, messages = imap.select("INBOX",readonly=True)

imap.literal = u"string to be encoded with UTF-8".encode('utf-8')
res, msg = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')

The suggested code works fine, but the returned array (msg[0]) contains indicies of the mailbox that are out of bounds. On the contrary when I use the .search class, valid indices are returned instead as long as I search for ASCII strings (both UTF-8 and non-UTF-8 encoded strings aren't accepted here). I don't quite understand the behaviour and logic of .uid because of this. I'd be grateful if someone can help me on the way.

How can I filter the subject with a UTF-8 string?


Solution

  • I managed to solve the scenario with the following, using the recommended way with .uid instead of .search:

    imap = imaplib.IMAP4_SSL("server_to_connect_to")
    imap.login(username, password)
    
    status, messages = imap.select("INBOX",readonly=True)
    imap.literal  = u'"Subject to be searched"'.encode('utf-8')
    res, uid = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
    messages = msg[0].decode('utf-8').split()
    
    for uid in messages:
        res, msg = imap.uid('fetch', uid, '(RFC822)')
        #parsing logic