Search code examples
pythonemailmimemultiparteml

AttributeError: 'str' object has no attribute 'copy' when parsing Multipart email message


Python 3.6 email module crashes with this error:

Traceback (most recent call last):
  File "empty-eml.py", line 9, in <module>
    for part in msg.iter_attachments():
  File "/usr/lib/python3.6/email/message.py", line 1055, in iter_attachments
    parts = self.get_payload().copy()
AttributeError: 'str' object has no attribute 'copy'

The crash can be reproduced with this EML file,

From: "[email protected]" <[email protected]>
To: <[email protected]>
Subject: COURRIER EMIS PAR PACIFICA 
MIME-Version: 1.0
Content-Type: multipart/mixed;
    boundary="----=_Part_3181_1274694650.1556805728023"
Date: Thu, 2 May 2019 16:02:08 +0200

and this piece of minimal code:

from email import policy
from email.parser import Parser
from sys import argv


with open(argv[1]) as eml_file:
    msg = Parser(policy=policy.default).parse(eml_file)

for part in msg.iter_attachments():
    pass

I believe it has to do something with the Content-Type being multipart/mixed together with the email content being empty, which causes get_payload to return str. However, I am not sure, if such EML is forbidden by standard (but I have many such samples), it is a bug in the email module, or me using the code wrong.


Solution

  • If you change the policy to strict:

    Parser(policy=policy.strict).parse(eml_file)
    

    the parser raises email.errors.StartBoundaryNotFoundDefect, described in the docs as:

    StartBoundaryNotFoundDefect – The start boundary claimed in the Content-Type header was never found.

    If you parse the message with policy.default and inspect it's defects afterwards it contains two defects:

    [StartBoundaryNotFoundDefect(), MultipartInvariantViolationDefect()]
    

    MultipartInvariantViolationDefect – A message claimed to be a multipart, but no subparts were found. Note that when a message has this defect, its is_multipart() method may return false even though its content type claims to be multipart.

    A consequence of the StartBoundaryNotFoundDefect is that the parser terminates parsing and sets the message payload to the body that has been captured so far - in this case, nothing, so the payload is an empty string, causing the exception that you are seeing when you run your code.

    Arguably the fact that Python doesn't check whether payload is a list before calling copy() on it is a bug.

    In practice, you have to handle these messages either by wrapping the iteration of attachments in a try/except, conditioning iteration on the contents of msg.defects, or parsing with policy.strict and discarding all messages that report defects.