Search code examples
javahtmlpdfhtml-parsingitext

itext pdf generation fail on parsing some html tags


I have this html code, which reside in db and I want to parse it in pdf. I am using itext for pdf generation. here is the html in db:

<p>no note.</p><br>
<ul><br>
<li><strong>section</strong></li><br>
</ul><br>
<ol><br>
<li>first</li><br>
<li><em>second</em></li><br>
<li><span style="text-decoration: underline;">third</span></li><br>
</ol><br>

and here is what is parsed and inserted into pdf:

<p>no note.</p><br>
<strong>section</strong><br>
first<br>
<em>second</em><br>
<span style="text-decoration: underline;">third</span><br>

and also here is my code to parse the html into pdf:

org.jsoup.nodes.Document doc = Jsoup.parse(text);
List<Element> objects;
objects = HTMLWorker.parseToList(new StringReader(doc.outerHtml()), null);
for (Element object : objects) {
        Element ele = (Element) object;
        document.add(ele);
}

as can be seen numbers and bullet are not shown (which are "ol" and "li" tags in html). How to solve this?

Edit

For more clarification. Here is the text I have in html:

enter image description here

and here is the note inserted into pdf:

enter image description here


Solution

  • my friend just solved it:

    XMLWorkerHelper.getInstance().parseXHtml(new XHtmlElementHandler(document), new StringReader(text));

    simple :)