I am parsing html using jsoup and want to extract innerHtml inside of body tag
so far I tried and use document.body.childern().outerHtml; but its giving only html element and skipping floating text(not wrapped within any html tag) inside of body
private String getBodyTag(final Document document) {
return document.body().children().outerHtml();
}
Input:
<!DOCTYPE html>
<html lang="de">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" type="text/css" href="assets/style.css">
</head>
<body>
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text
</body>
</html>
Expected:
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text
Actual:
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
Please use this:
private String getBodyTag(final Document document) {
return document.body().html();
}