Search code examples
javaitextxmlworker

Parsing HTML snippets and adding to PdfPTable


I am creating a PDF made of several PdfPTables where some PdfCell's consist of simple Phrases and others need to be parsed HTML snippets. To make sure the parsed HTML is added with the needed styling and in the correct place I have been storing it in a Paragraph then adding it to a PdfPCell. However doing this causes me to run into issues when dealing with some HTML tags like lists and quotes. Below is a rough example of what I am doing, what can I do to properly handle HTML lists, quotes, etc. ?

For example: iText properly handles a HTML list and knows to convert it to an iText List/ListItem. I need to add that List into my PdfTable. I know putting the List Element in a Paragraph cancels out the proper styling (the entire list ends up being on one line with no numbering) and would like to know the proper way of handling this

PdfPTable table = new PdfPTable(1);
    table.addCell(parseHtmlToParagraph(htmlString));
    table.addCell(new Phrase("Name" + user.getName()));

public Paragraph parseHtmlToParagraph(String str) throws IOException {
    StringReader body = new StringReader(str);
    final Paragraph para = new Paragraph();

    XMLWorkerHelper.getInstance().parseXHtml(new ElementHandler() {
        @Override
        public void add(Writable w) {
            if (w instanceof WritableElement) {
                List<Element> elements = ((WritableElement) w).elements();
                for (Element e : elements) {
                    para.add(e);
                }
            }
        }
    }, body);

    return para;
}

Solution

  • The answer is simple: you are throwing away all structure (such as a list structure), by creating a cell in text mode instead of creating a cell in composite mode.

    Create your cell like this:

    PdfPCell cell = new PdfPCell();
    List<Element> elements = ((WritableElement) w).elements();
    for (Element e : elements) {
         cell.addElement(e);
    }
    

    You are implicitly creating a PdfPCell instance by using the addCell() method. You are passing a Paragraph to this method, but this Paragraph is casted to a Phrase. When you implicitly create a PdfPCell with a Phrase, all content present in that Phrase will be downgraded to mere text elements.