Search code examples
c#itextxmlworker

iTextsharp - XmlWorker PDF - &#160 visible in PDF


I am converting HTML to PDF using iTextSharp XMLWorkder class. Everything working fine except when there is any empty HTML table is there, it puts " " character in that, which is then visible in PDF clearly.

I tried to replace this with empty space or <br/>, but it gave error "table width must be greater than zero".

Can any one suggest what should I do?


Solution

  • Doubt iTextSharp puts &#160; in the PDF. On the contrary, iTextSharp is smart enough to correctly recognize it as a non breaking space. Here's proof:

        string HTML = @"
    <div>
    <h1>HTML Encoded non breaking space</h1><table border='1'><tr><td>&amp;#160;</td></tr></table>
    <h1>HTML non breaking space</h1><table border='1'><tr><td>&#160;</td></tr></table>
    <div style='background-color:yellow;'><h1>Empty Table</h1><table><tr><td></td></tr></table></div>
    </div>
        ";
    
    using (var stringReader = new StringReader(HTML))
    {
        using (FileStream stream = new FileStream(
            outputFile,
            FileMode.Create,
            FileAccess.Write))
        {
            using (var document = new Document())
            {
                PdfWriter writer = PdfWriter.GetInstance(
                    document, stream
                );
                document.Open();
                XMLWorkerHelper.GetInstance().ParseXHtml(
                    writer, document, stringReader
                );
            }
        }
    }
    

    enter image description here

    So the more likely case is that the HTML sent to the parser has encoded &#160; as &amp;#160;. The simple fix is to replace the encoded HTML entity before it goes to the parser:

    HTML = HTML.Replace("&amp;#160;", "\u00A0");