Search code examples
javaemailtextemlapache-commons-email

What is the best way to get text from .eml file?


I try to get to, from, topic and message body from several eml files which are on my local drive. Now I've tried to use Apache Commons Email, but sometimes it loops with no errors. Here is my code which supposed to get text from eml and save it to txt:

MimeMessage mimeMessage = MimeMessageUtils.createMimeMessage(null, file);
MimeMessageParser parser = new MimeMessageParser(mimeMessage);

if (parser.parse().hasPlainContent()) {
    //Trying to get text of the message
    try (FileWriter writer = new FileWriter(txtName)) {
        writeHeaders(writer, parser);
        writer.write(parser.parse().getPlainContent());
    } catch (IOException e) {
        e.printStackTrace();
    }
} else if (parser.parse().hasHtmlContent()) {
    try (FileWriter writer = new FileWriter(txtName)) {
        writeHeaders(writer, parser);
        String text = Jsoup.parse(parser.parse().getHtmlContent()).text();
        writer.write(text);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Also here is writeHeaders method:

private void writeHeaders(FileWriter writer, MimeMessageParser parser) throws Exception {
    writer.write("From :" + parser.getFrom() + "\n");
    writer.write("To:" + parser.getTo() + "\n");
    writer.write("Subject:" + parser.getSubject() + "\n");
    writer.write("Message:" + "\n" + "\n");
}

And here is method to get attachments:

if (parser.parse().hasAttachments()) {
    //Getting and saving attachments from eml
    List<DataSource> attachments = parser.parse().getAttachmentList();
    for (DataSource attachment : attachments) {
        if (attachment.getName() != null && !attachment.getName().isEmpty()) {
            try (InputStream is = attachment.getInputStream()) {
                File save = new File(saveDir + File.separator + attachment.getName());
                FileOutputStream fos = new FileOutputStream(save);
                byte[] buf = new byte[4096];
                int bytesRead;
                while ((bytesRead = is.read(buf)) != -1) {
                    fos.write(buf, 0, bytesRead);
                }
                fos.close();
                if (save.getName().endsWith("eml")) {
                    parseEml(save, count);
                }
            } catch (Exception e) {
                e.printStackTrace();
            }

So, maybe there are any easier ways to get text and attachments?


Solution

  • Yes much easier. Simple Java Mail (Github) can read .eml files and makes the content very accessible. If you find something like a looping error there too (unlikely), I'll be happy to assist you there (I actively maintain Simple Java Mail):

    Email email = EmailConverter.emlToEmail(emlFile);
    
    email.getFromRecipient();
    email.getSubject();
    email.getPlainText();
    email.getHTMLText();
    email.getAttachments();
    email.getEmbeddedImages();
    email.getHeaders();
    // etc. etc.
    

    Also supports S/MIME encrypted emails (if you have the required certificates to decrypt the emails).