I'm trying to scrape data from a specific folder in a Gmail account I have access to.
I recently tried running this code using Python 2.7 on Windows 7 while logged into the Gmail account of interest. For some reason though it seems to run for a long time (I left it for as long as 40 minutes) without completing or providing an error.
As it stands right now the folder I'm targeting in the Gmail account only has about 50 simple text emails with no attachments, pictures, or anything that might suggest the process should take as long as it does. Has anyone come across an issue like this before doing something similar with IMAP?
Code for completeness:
#!/usr/bin/env python
#
# Very simple Python script to dump all emails in an IMAP folder to files.
# This code is released into the public domain.
#
# RKI Nov 2013
#
import sys
import imaplib
import getpass
IMAP_SERVER = 'imap.gmail.com'
EMAIL_ACCOUNT = "notatallawhistleblowerIswear@gmail.com"
EMAIL_FOLDER = "Top Secret/PRISM Documents"
OUTPUT_DIRECTORY = 'C:/src/tmp'
PASSWORD = getpass.getpass()
def process_mailbox(M):
"""
Dump all emails in the folder to files in output directory.
"""
rv, data = M.search(None, "ALL")
if rv != 'OK':
print "No messages found!"
return
for num in data[0].split():
rv, data = M.fetch(num, '(RFC822)')
if rv != 'OK':
print "ERROR getting message", num
return
print "Writing message ", num
f = open('%s/%s.eml' %(OUTPUT_DIRECTORY, num), 'wb')
f.write(data[0][1])
f.close()
def main():
M = imaplib.IMAP4_SSL(IMAP_SERVER)
M.login(EMAIL_ACCOUNT, PASSWORD)
rv, data = M.select(EMAIL_FOLDER)
if rv == 'OK':
print "Processing mailbox: ", EMAIL_FOLDER
process_mailbox(M)
M.close()
else:
print "ERROR: Unable to open mailbox ", rv
M.logout()
if __name__ == "__main__":
main()
The code works fine for me. Below, I have added some debug prints to your code (using pprint) to view the attributes of the IMAP4_SSL object M
. My Gmail uses two factor authentication so I needed to setup a gmail app password
from pprint import pprint
# ....
M = imaplib.IMAP4_SSL(IMAP_SERVER)
print('---- Attributes of the IMAP4_SSL connection before login ----')
pprint(vars(M))
M.login(EMAIL_ACCOUNT, PASSWORD)
print('\n \n')
print('---- Attributes of the IMAP4_SSL connection after login ----')
pprint(vars(M))
# open specific folder
rv, data = M.select(EMAIL_FOLDER)
print('\n \n')
print('---- Data returned from select of folder = {}'.format(data))
pprint(vars(M))
for:
'welcome': '\* OK Gimap ready for requests from ...
'port': 993,
pprint(vars(M))
for:
_cmd_log
for a successful login: 6: ('< PJIL1 OK **@gmail.com authenticated (Success)
data
returned from M.select(EMAIL_FOLDER)
should be the number of emails available to download.