I am trying to traverse a HTML body, in order to find all the <h1>
tags:
Element body = docJSoup.body();
Elements mainCmp = body.select("h1");
So, considering this body's fragment:
<h1><span style='mso-bookmark:_Toc283737133'><span
style='mso-spacerun:yes'></span><span style='mso-spacerun:yes'></span><a
name="_Toc35343186"></a><a name="_Toc264704629"></a><span style='mso-bookmark:
_Toc35343186'>3<span style='mso-tab-count:1'></span>Aspetti metodologici</span></span></h1>
I'm going to get this:
<span style="mso-bookmark:_Toc283737133"><span style="mso-spacerun:yes"></span><span style="mso-spacerun:yes"></span><a name="_Toc35343186"></a><a name="_Toc264704629"></a><span style="mso-bookmark:
_Toc35343186">3<span style="mso-tab-count:1"></span>Aspetti metodologici</span></span>
By the way, I would like to maintain also the <h1>
tag into the result.
And the <h1>
tag itself could also have other attributes, so I cannot just concatenate "<h1>"
to the resulting string.
Is there a way to keep it using JSoup methods?
Thanks for any insights.
outerHtml()
will give you the node's markup including its own opening and closing tags.