Search code examples
c#pdf-generationitextsection508

iTextSharp parse HTML to 508 compliant PDF table


The code below will create a PDF from HTML. The problem is that when the document is tagged the TH tags are written to the PDF as a TD. Is there anyway to get the tags in the PDF to appear as TH?

          string html = @"<table>
                            <tr>
                                <TH> header1 </TH>
                                <TH> header2 </TH>
                                <TH> header3 </TH>
                            </tr>
                            <tr>
                                <td> col 1</td>
                                <td> col 2</td>
                                <td> col 3</td>
                            </tr>
                        </table>";

        FileStream fs = new FileStream(@"C:\\test.pdf", FileMode.Create);
        TextReader reader = new StringReader(html);

        Document document = new Document(PageSize.A4, 30, 30, 30, 30);

        PdfWriter writer = PdfWriter.GetInstance(document, fs);
        writer.SetTagged();

        writer.SetPdfVersion(PdfWriter.PDF_VERSION_1_7);

        document.Open();


        XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader);
        document.Close();

        fs.Close();`

Solution

  • We have added a correct tagging of TH element. The changes will be included into the next iText XMLWorker release. Generally XMLWorker is not adopted to generate correctly tagged PDF. But XMLWorker uses a base iText Core tagging logic and TD is default role for all kind of table cells.