Search code examples
perlemailparsingmime

Parsing broken multipart/mixed email


I'm trying to use perl Email::MIME to capture the multipart/mixed portion of an email into a buffer. I have a script that I've been using for a long time that is now having a problem with a particular email, and I think it's because the email is not formatted properly.

use Email::MIME;
my $buf;
while(<STDIN> ){
        $buf .= $_;
}
my @mailData;
my $msg = Email::MIME->new($buf);
my @parts = $msg->parts;
my $desc = $msg->debug_structure;
print "descr: $desc\n";
 $msg->walk_parts(sub {
     my ($part) = @_;
     #warn($part->content_type . ": " . $part->subparts);
     my $content_type = $msg->content_type;
     if (($content_type =~ m/multipart\/mixed/i) && !@mailData) {
             @mailData = split( '\n', $part->body);
        print $part->body;
     }
     elsif (($part->content_type =~ m/text\/plain$/i) && !@mailData) { 
        @mailData = split( '\n', $part->body);
     }
 });

Here are the relevant sections of the email, exactly as displayed, including the two hyphen lines:

Content-Type: multipart/mixed; boundary="===============7958309719180706421=="
Content-Length: 8034
Status: RO
X-Status: 
X-Keywords: NonJunk         
X-UID: 2

--===============7958309719180706421==

--------------------------------------------------------------------------------
Sending command: "Execute SMART Extended self-test routine immediately in off-line 
Drive command "Execute SMART Extended self-test routine immediately in off-line
    
--------------------------------------------------------------------------------

system security tool that allows system administrators to set authentication
policy without having to recompile programs that handle authentication.

The above code adds the body text to @mailData but only the text after the second line of hyphens. It just skips over it entirely and only collects the text starting with "system security tool".

It also prints this description showing the parts:

descr: + multipart/mixed; boundary="===============7958309719180706421=="
     + 
     + text/plain; charset="utf-8"

The text/plain part, as it displays in an email client, appears to be only the mailing list info, not the actual body content.

Is this a bug in Email::MIME? Or am I not processing the parts correctly?

Edit: Here is a link to the full original email https://pastebin.com/F5MbSfYm


Solution

  • This looks for me like a problem with Email::MIME, but also with the mail itself:

    • The first MIME part has no MIME header and thus should be text/plain with US-ASCII (see RFC 2046) - but it is obviously utf-8. This is a problem in the mail.
    • Looks like Email::MIME cannot properly deal with MIME parts which have no MIME header.

    The latter problem is actually known for 10 years and proposed patches seem to exist for 8 years - see https://github.com/rjbs/Email-MIME/issues/14