Search code examples
python-3.ximaplibyahoo-mail

No MESSAGE-ID and get imap_tools work for imap.mail.yahoo.com


The question is twofold, about getting MESSAGE-ID, and using imap_tools. For an email client ("handmade") in Python I need to lessen the data amount read from the server (presently it takes 2 min to read the whole mbox folder of ~170 msg for yahoo), I believe that having MESSAGE-ID will help me.

imap_tools has IDLE command which is essential to keep the yahoo server connection alive and other features which I believe will simplify the code.

To learn about MESSAGE-ID I started with the following code (file fetch_ssl.py):

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import imaplib
import email
import os
import ssl
import conf
# Why UID==1 has no MESSAGE-ID ?
if __name__ == '__main__':
    args = conf.parser.parse_args()
    host, port, env_var = conf.config[args.host]
    if 0 < args.verbose:
        print(host, port, env_var)
    with imaplib.IMAP4_SSL(host, port,
                           ssl_context=ssl.create_default_context()) as mbox:
        user, pass_ = os.getenv('USER_NAME_EMAIL'), os.getenv(env_var)
        mbox.login(user, pass_)
        mbox.select()
        typ, data = mbox.search(None, 'ALL')
        for num in data[0].split():
            typ, data = mbox.fetch(num, '(RFC822)')
            msg = email.message_from_bytes(data[0][1])
            print(f'num={int(num)}, MESSAGE-ID={msg["MESSAGE-ID"]}')
            ans = input('Continue[Y/n]? ')
            if ans.upper() in ('', 'Y'):
                continue
            else:
                break

Where conf.py is:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse

HOST = 'imap.mail.yahoo.com'
PORT = 993
config = {'gmail': ('imap.gmail.com', PORT, 'GMAIL_APP_PWD'),
          'yahoo': ('imap.mail.yahoo.com', PORT, 'YAHOO_APP_PWD')}
parser = argparse.ArgumentParser(description="""\
Fetch MESSAGE-ID from imap server""")
parser.add_argument('host', choices=config)
parser.add_argument('-verbose', '-v', action='count', default=0)

fetch_ssl.py outputs:

$ python fetch_ssl.py yahoo
num=1, MESSAGE-ID=None
Continue[Y/n]? 
num=2, MESSAGE-ID=<[email protected]>
Continue[Y/n]? n

I'd like to understand why the message with UID == 1 has no MESSAGE-ID? Does that happen from time to time (I mean there are messages with no MESSAGE-ID)? How to handle these cases? I haven't found such cases for gmail.

Then I attempted to do similar with imap_tools (Version: 0.56.0), (file fetch_tools.py):

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import ssl
from imap_tools import MailBoxTls
import conf

# https://github.com/ikvk/imap_tools/blob/master/examples/tls.py
# advices
# ctx.load_cert_chain(certfile="./one.crt", keyfile="./one.key")
if __name__ == '__main__':
    args = conf.parser.parse_args()
    host, port, env_var = conf.config[args.host]
    if 0 < args.verbose:
        print(host, port, env_var)
    user, pass_ = os.getenv('USER_NAME_EMAIL'), os.getenv(env_var)
    ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
    ctx.options &= ~ssl.OP_NO_SSLv3
    # imaplib.abort: socket error: EOF
    with MailBoxTls(host=host, port=port, ssl_context=ctx) as mbox:
        mbox.login(user, pass_, 'INBOX')
        for msg in mbox.fetch():
            print(msg.subject, msg.date_str)

Command

$python fetch_tools.py yahoo

outputs:

Traceback (most recent call last):
  File "/home/vlz/Documents/python-scripts/programming_python/Internet/Email/ymail/imap_tools_lab/fetch_tools.py", line 20, in <module>
    with MailBoxTls(host=host, port=port, ssl_context=ctx) as mbox:
  File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 322, in __init__
    super().__init__()
  File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 35, in __init__
    self.client = self._get_mailbox_client()
  File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 328, in _get_mailbox_client
    client = imaplib.IMAP4(self._host, self._port, self._timeout)  # noqa
  File "/usr/lib/python3.9/imaplib.py", line 205, in __init__
    self._connect()
  File "/usr/lib/python3.9/imaplib.py", line 247, in _connect
    self.welcome = self._get_response()
  File "/usr/lib/python3.9/imaplib.py", line 1075, in _get_response
    resp = self._get_line()
  File "/usr/lib/python3.9/imaplib.py", line 1185, in _get_line
    raise self.abort('socket error: EOF')
imaplib.abort: socket error: EOF

Command

$ python fetch_tools.py gmail

Produces identical results. What are my mistakes?

Using Python 3.9.2, Debian GNU/Linux 11 (bullseye), imap_tools (Version: 0.56.0)

EDIT

Headers from the message with no MESSAGE-ID

X-Apparently-To: [email protected]; Sun, 25 Oct 2015 20:54:21 +0000
Return-Path: <[email protected]>
Received-SPF: fail (domain of product.communications.yahoo.com does not designate 216.39.62.96 as permitted sender)
...
X-Originating-IP: [216.39.62.96]
Authentication-Results: mta1029.mail.bf1.yahoo.com  from=product.communications.yahoo.com; domainkeys=neutral (no sig);  from=product.communications.yahoo.com; dkim=pass (ok)
Received: from 127.0.0.1  (EHLO n3-vm4.bullet.mail.gq1.yahoo.com) (216.39.62.96)
  by mta1029.mail.bf1.yahoo.com with SMTPS; Sun, 25 Oct 2015 20:54:21 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=product.communications.yahoo.com; s=201402-std-mrk-prd; t=1445806460; bh=5PTgF8Jghm92xeMD5mSHp6A3eRVV70PWo1oQ15K7Tfk=; h=Date:From:Reply-To:To:Subject:From:Subject; b=D7ItgOiuLbiexJGHvORgbpRi22X+sYso6gwZKDXVca79DxMMy2R1dUtZTIg7tcft1lovVJUDw/7fC51orDltRidlfnpayeY8lT+94DRlSBwopuxgOqqR9oTTjTBZ0oEvdxUcXl/q54N2GxuBFvmg8UO0OZoCnFPpUVYo9x4arMjt/0TOW1Q5d/yjdmO7iwiued/rliP/Bsq0TaZYcb0oCAT7Q50tb1fB7wcXLYNSC1OCQ1l1LajbUqmU1LWWNse36mUUTBieO2sZT0ERFrHaCTaTNQSXKQG2AxYF7Dd/8i0Iq3xqdcS0bDpjmWE25uoKvCdtXtUbylsuQSChuLFMTw==
Received: from [216.39.60.185] by n3.bullet.mail.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
Received: from [98.137.101.84] by t1.bullet.mail.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
Date: 25 Oct 2015 20:54:20 +0000
Received: from [127.0.0.1] by nu-repl01.direct.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
X-yahoo-newman-expires: 1445810060
From: "Yahoo Mail" <[email protected]>
Reply-To: [email protected]
To: <ME>@yahoo.com
Subject: Welcome to Yahoo! Vladimir
X-Yahoo-Newman-Property: ydirect
Content-Type: text/html
Content-Length: 25180

I skipped only X-YMailISG.

EDIT II

Of 167 messages 21 have no MESSAGE-ID header.
fetch_ssl.py takes 4m12.342s, and fetch_tools.py -- 3m41.965s


Solution

  • It looks simply like your email without a Message-ID legitimately does not have one; it appears the welcome email Yahoo sent you actually lacks it. Since it's a system generated email, that's not that unexpected. You'd just have to skip over it.

    The second problem is that you need to use imap_tools.MailBox. Looking at the documentation and source at the repo it appears that the relevant classes to use are:

    • MailBox - for a normal encrypted connection. This is what most email servers use these days, aka IMAPS (imap with SSL/TLS)
    • MailBoxTls - For a STARTTLS connection: this creates a plaintext connection then upgrades it later by using a STARTTLS command in the protocol. The internet has mostly gone to the "always encrypted" rather than "upgrade" paradigm, so this is not the class to use.
    • MailBoxUnencrypted - Standard IMAP without SSL/TLS. You should not use this on the public internet.

    The naming is a bit confusing. MailBox corresponds to imaplib.IMAP4_SSL; MailBoxTls corresponds to imaplib.IMAP4, then using startls() on the resulting connection; and MailboxUnencrypted corresponds to imaplib.IMAP4 with no security applied. I imagine it's this way so the most common one (Mailbox) is a safe default.