Search code examples
python-3.xexchangelib

MemoryError when fetching email with exchangelib


I have a question regarding saving email data in batches using exchangelib. Currently it is taking a lot of time if there are many emails. After a few minutes it throws this error:

    ERROR:    MemoryError:
    Retry: 0
    Waited: 10
    Timeout: 120
    Session: 25999
    Thread: 28148
    Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
    URL: https://outlook.office365.com/EWS/Exchange.asmx
    HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
    Allow redirects: False
    Streaming: False
    Response time: 411.93799999996554
    Status code: 503
    Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
    Response headers: {}

Here is the code that I use for connecting and reading:

def connect_mail():
    config = Configuration(
        server="outlook.office365.com",
        credentials=Credentials(
            username="myworkemail@workdomain.com", password="*******"
        ),
    )
    return Account(
        primary_smtp_address="myworkemail@workdomain.com",
        config=config,
        access_type=DELEGATE,
    )

def import_email(account):
    tz = EWSTimeZone.localzone()
    start = EWSDateTime(2020, 10, 26, 22, 15, tzinfo=tz)
    for item in account.inbox.filter(
        datetime_received__gt=start, is_read=False
    ).order_by("-datetime_received"):
        email_body = item.body
        email_subject = item.subject
        soup = bs(email_body, "html.parser")
        tables = soup.find_all("table")
        item.is_read = True
        item.save()
        # Some code here for saving the email to a database

Solution

  • You're getting a MemoryError which means that Python is not able to allocate any more memory on your machine.

    There's a couple of things you can do to reduce memory consumption of your script. One is to use .iterator() which disables internal caching of your query results. Another is to fetch only the fields you actually need using .only()

    When you're using .only(), the other fields will be None. You need to remember to only save the one field you actually changed: item.save(update_fields=['is_read'])

    Here's an example of how to use the two improvements:

    for item in account.inbox.filter(
            datetime_received__gt=start, is_read=False,
        ).only(
            'is_read', 'subject', 'body',
        ).order_by('-datetime_received').iterator():