Search code examples
javahtml-parsingjsouphtml-parser

How do I convert a document made in Jsoup (the Java html parser) into a string


I have a document that was made in jsoup that looks like this

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();

How do i convert that doc into a string.


Solution

  • Have you tried:

    Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
    String htmlString = doc.toString();
    

    As Document extends Element it also has got the method html() which "Retrieves the element's inner HTML" according to the API. So that should work:

    Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
    String htmlString = doc.html();
    

    Additional Info:

    Each Document object has got a reference to an instance of the inner class Document.OutputSettings which can be accessed via the method outputSettings() of Document. There you can enable/disable pretty-printing by using the setter prettyPrint(true/false). See the API for Document and Document.OutputSettings for furtherinformation