I have try to implement the java mail api to read body of the message and store it into text file if it contains contents.
I can able to read the body of the message but it comes with some html elements.
I have added below code in which I have used.
Properties props = System.getProperties();
props.setProperty("mail.store.protocol", "imaps");
Session session = Session.getDefaultInstance(props, null);
Store store = session.getStore("imaps");
store.connect("hostname", "username", "password");
String result = null;
Folder inbox = store.getFolder("Inbox");
inbox.open(Folder.READ_ONLY);
javax.mail.Message messages[]=inbox.search(new FlagTerm(new Flags(Flag.SEEN), false));
for(Message message:messages) {
System.out.println(Jsoup.parse(message).text());
}
How can I remove those html elements in retrieved message?
Please anyone help me to solve this.
To remove all HTML tags in your mail use the jsoups text()
method.
Example Code
String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";
System.out.println(Jsoup.parse(htmlString).text());
Output
Hi Data is written in this mail.
If specific elements should be result in line-breaks similar to the rendered HTML source, you could add line-breaks and then avoid pretty printing it, when you jsoups' clean
method.
prettyPrint
If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input.
Example Code
String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";
htmlString = htmlString.replaceAll("<br>", System.getProperty("line.separator") + "<br>"); // do replacements for all tags that should result in line-breaks
Document.OutputSettings settings = new OutputSettings();
settings.prettyPrint(false); // to keep line-breaks
String cleanedSource = Jsoup.clean(htmlString, "", Whitelist.none(), settings);
System.out.println(cleanedSource);
Output
Hi
Data is written in this mail.
[... four more empty lines]