Search code examples
javahtmljsoup

Extract innerHtml out of body tag using jsoup


I am parsing html using jsoup and want to extract innerHtml inside of body tag

so far I tried and use document.body.childern().outerHtml; but its giving only html element and skipping floating text(not wrapped within any html tag) inside of body

private String getBodyTag(final Document document) {
        return document.body().children().outerHtml();
}

Input:

<!DOCTYPE html>
<html lang="de">
    <head>
        <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <link rel="stylesheet" type="text/css" href="assets/style.css">
    </head>
    <body>
       <div>questions to improve formatting and clarity.</div>
       <h3>Guided Mode</h3> 
       some sample raw/floating text
    </body>
</html>

Expected:

<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3> 
some sample raw/floating text

Actual:

<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>

Solution

  • Please use this:

    private String getBodyTag(final Document document) {
        return document.body().html();
    }