Search code examples

How can I remove those html elements, while retain the formatting?

I have try to implement the java mail api to read body of the message and store it into text file if it contains contents.

I can able to read the body of the message but it comes with some html elements.

I have added below code in which I have used.

Properties props = System.getProperties();
    props.setProperty("", "imaps");

    Session session = Session.getDefaultInstance(props, null);
    Store store = session.getStore("imaps");
    store.connect("hostname", "username", "password");
    String result = null;
    Folder inbox = store.getFolder("Inbox");;
    javax.mail.Message messages[] FlagTerm(new Flags(Flag.SEEN), false));
    for(Message message:messages) {

How can I remove those html elements in retrieved message?

Please anyone help me to solve this.


  • To remove all HTML tags in your mail use the jsoups text() method.

    Example Code

    String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";


    Hi Data is written in this mail.

    If specific elements should be result in line-breaks similar to the rendered HTML source, you could add line-breaks and then avoid pretty printing it, when you jsoups' clean method.


    If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input.

    Example Code

    String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";
    htmlString = htmlString.replaceAll("<br>", System.getProperty("line.separator") + "<br>"); // do replacements for all tags that should result in line-breaks
    Document.OutputSettings settings = new OutputSettings();
    settings.prettyPrint(false); // to keep line-breaks
    String cleanedSource = Jsoup.clean(htmlString, "", Whitelist.none(), settings);


     Data is written in this mail.
    [... four more empty lines]