Search code examples
emailparsingjakarta-mail

Extracting from and to addresses from body of the email, with reply chains- JavaMail Api.


I am trying to extract contents from the enron dataset. I thought i would try it with the Javamail Api because it would be easy to parse.However, I am new to JavaMail and i referred to some materials online.

I was able to create a MimeMessage object of the file and extract various fields. The object.getContent() was able to give me the content in the body.

What i want to do is to extract the from and to addresses from the body. And I am not sure how to do that.

I read about the creating a Multipart object and trying to extract from it.

  1. use javax.mail.Message.getContent() to get the message's content. This should return the entire message's content, in an object of type javax.mail.Multipart.

  2. use the methods on java.mail.Multipart to retrieve a particular part of the message. This should be encapsulated in an object of type javax.mail.BodyPart.

  3. use the methods on javax.mail.BodyPart to retrieve the content of the particular part of the message that you're interested in.

The Mime-type specified in my case is not Multipart. However, When I try the above method, i get a "Exception in thread "main" java.lang.ClassCastException: java.lang.String cannot be cast to javax.mail.Message"

What should i do?


The below is the content of the file that i am trying to parse.

Message-ID: <16159836.1075855377439.JavaMail.evans@thyme>
Date: Fri, 7 Dec 2001 10:06:42 -0800 (PST)
From: [email protected]
To: [email protected]
Subject: RE: West Position
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Dunton, Heather </O=ENRON/OU=NA/CN=RECIPIENTS/CN=HDUNTON>
X-To: Allen, Phillip K. </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Pallen>
X-cc: 
X-bcc: 
X-Folder: \Phillip_Allen_Jan2002_1\Allen, Phillip K.\Inbox
X-Origin: Allen-P
X-FileName: pallen (Non-Privileged).pst


Please let me know if you still need Curve Shift.

Thanks,
Heather
 -----Original Message-----
From:   Allen, Phillip K.  
Sent:   Friday, December 07, 2001 5:14 AM
To: Dunton, Heather
Subject:    RE: West Position

Heather,

Did you attach the file to this email?

 -----Original Message-----
From:   Dunton, Heather  
Sent:   Wednesday, December 05, 2001 1:43 PM
To: Allen, Phillip K.; Belden, Tim
Subject:    FW: West Position

Attached is the Delta position for 1/16, 1/30, 6/19, 7/13, 9/21


 -----Original Message-----
From:   Allen, Phillip K.  
Sent:   Wednesday, December 05, 2001 6:41 AM
To: Dunton, Heather
Subject:    RE: West Position

Heather,

This is exactly what we need.  Would it possible to add the prior day for each of the dates below to the pivot table.  In order to validate the curve shift on the dates below we also need the prior days ending positions.

Thank you,

Phillip Allen

 -----Original Message-----
From:   Dunton, Heather  
Sent:   Tuesday, December 04, 2001 3:12 PM
To: Belden, Tim; Allen, Phillip K.
Cc: Driscoll, Michael M.
Subject:    West Position


Attached is the Delta position for 1/18, 1/31, 6/20, 7/16, 9/24



 << File: west_delta_pos.xls >> 

Let me know if you have any questions.


Heather

This is the code i use:

private void mailParser() throws IOException, MessagingException {
    File mailFiles = new File("/xxx/xx/xx/x/x/inbox/1");
    String host = "host.com";
    Properties properties = System.getProperties();

    properties.setProperty("mail.smtp.host", host);
    Session session = Session.getDefaultInstance(properties);

    MimeMessage email = null;
    try {
        FileInputStream fis = new FileInputStream(mailFiles);
        email = new MimeMessage(session, fis);

        //Message ID
        System.out.println("message id: " + email.getMessageID());

        //Date
        System.out.println("sent date : " + email.getSentDate());

        //From
        Address[] add = email.getFrom();
        if (add != null) {
            for (int i = 0; i < add.length; i++) {
                System.out.println("FROM  : " + add[i].toString());
            }

        //Subject
        System.out.println("\nsubject: " + email.getSubject());

        //TO
        if (email.getRecipients(Message.RecipientType.TO) != null) {
            for( Address emails: email.getRecipients(Message.RecipientType.TO)){
            System.out.println("\nrecipients to: " + Arrays.asList(email.getRecipients(Message.RecipientType.TO)));
        }

        //CC 
        if (email.getRecipients(Message.RecipientType.CC) != null) {
              for( Address emails: email.getRecipients(Message.RecipientType.CC)){   
            System.out.println("\nrecipients cc: " + Arrays.asList(email.getRecipients(Message.RecipientType.CC)));
        }

        //BCC
        if (email.getRecipients(Message.RecipientType.BCC) != null) {
              for( Address emails: email.getRecipients(Message.RecipientType.BCC)){
            System.out.println("\nrecipients bcc: " + Arrays.asList(email.getRecipients(Message.RecipientType.BCC)));
        }

        //Content type
        System.out.println("contetnt type: " + email.getContentType());

        //Content Encoding
        System.out.println("encoding: " + email.getEncoding());

        //Content of email
        Message message = (Message) email.getContent();

        if(message instanceof MimeMessage)
        {
        MimeMessage m = (MimeMessage)message;
        Object contentObject = m.getContent();
        if(contentObject instanceof Multipart)
        {
            BodyPart clearTextPart = null;
            Multipart content = (Multipart)contentObject;
            int count = content.getCount();
            for(int i=0; i<count; i++)
            {
                BodyPart part =  content.getBodyPart(i);                 
                    clearTextPart = part;
                    break;
            }

            if(clearTextPart!=null)
            {
               String result = (String) clearTextPart.getContent();
                System.out.println(result);
            }


        }

        System.out.println("Content of email" + email.getContent().toString());
    } catch (MessagingException e) {
        throw new IllegalStateException("illegal state issue", e);
    } catch (FileNotFoundException e) {
        throw new IllegalStateException("file not found issue issue: " + mailFiles.getAbsolutePath(), e);
    }
}

Solution

  • What you're seeing is a reply to a reply to a reply to a message, where the original message text and some header information is included as new text in the reply message. As far as MIME is concerned, the text of the original message appears in the reply message just as if you had typed it in yourself, like any other part of the text of the reply message. The "Original Message" separator is not something that's known to MIME. The top level message is just a plain text message, not a multipart message, and has no MIME structure.

    Because JavaMail is parsing the MIME structure of the message, it doesn't handle the message content specially. I'm afraid you're pretty much on your own to parse the content of the message to extract the included/replied message text.

    You'll also notice that the From and To addresses in the message body are just names, not email addresses, and not at all in RFC 2822 format. Nor are the dates in the correct format. The mail reader (most likely Outlook) just included the text from the original message in the reply in a "human readable format" for your convenience.