Search code examples
c#itextxmlworker

Conversion from HTML to pdf generates an exception


I have a small C# desktop application that creates a pdf file given some HTML, retrieved from an *.eml file. Here is a sample:

<html>
<head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
 <div style="font: normal 13px Arial; color:#000000;">
  <p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><font size="3"><font face="Calibri">Some text<o:p></o:p></font></font><br />
  </p>
  <p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><o:p><font size="3" face="Calibri">&nbsp;</font></o:p><br />
   <span style="FONT-SIZE: 11pt; FONT-FAMILY: &quot;Calibri&quot;,&quot;sans-serif&quot;; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: &quot;Times New Roman&quot;; mso-fareast-language: EN-US; mso-ascii-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-theme-font: minor-bidi; mso-ansi-language: IT; mso-bidi-language: AR-SA">Some other text</span>
  </p>
 </div>
</body>
</html>

Everything works just fine on my machine (Win10 x64), however when I run the same code on the client's machine (Win Server 2008 R2 x64), I get the "document has no pages" message from an iTextsharp exception.

This only happens sometimes, for specific HTML strings like the one I just posted; I can't run a debugging session on the client's machine, however I verified that the program receives well formed HTML (as it is parsed with the HTML Agility Pack).

Can this be a font-related issue? I have absolutely no clue, these seem to be present on the client's machine.

Here is a snippet of the code I use to create the pdf document (it uses a custom image tag processor, though it should not be the issue since there isn't any in the given snippet):

using (var document = new Document())
{
    var writer = PdfWriter.GetInstance(document, new FileStream(destinationPath, FileMode.Create));
    writer.CompressionLevel = PdfStream.BEST_COMPRESSION;
    document.Open();

    var tagProcessors = (DefaultTagProcessorFactory)Tags.GetHtmlTagProcessorFactory();
    tagProcessors.RemoveProcessor(HTML.Tag.IMG);
    tagProcessors.AddProcessor(HTML.Tag.IMG, new CustomImageTagProcessor());
    CssFilesImpl cssFiles = new CssFilesImpl();
    cssFiles.Add(XMLWorkerHelper.GetInstance().GetDefaultCSS());
    var cssResolver = new StyleAttrCSSResolver(cssFiles);
    cssResolver.AddCss(@"code { padding: 2px 4px; }", "utf-8", true);
    var charset = Encoding.UTF8;
    var hpc = new HtmlPipelineContext(new CssAppliersImpl(new XMLWorkerFontProvider()));
    hpc.SetAcceptUnknown(true).AutoBookmark(true).SetTagFactory(tagProcessors);
    var htmlPipeline = new HtmlPipeline(hpc, new PdfWriterPipeline(document, writer));                            
    var pipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
    var worker = new XMLWorker(pipeline, true);
    var xmlParser = new XMLParser(true, worker, charset);
    xmlParser.Parse(new StringReader(fixedMarkup));
}

Solution

  • Found the issue. As I suspected, it was related to the font.

    On my machine, Calibri font can be embedded in the *.pdf document, while on the other machines its "Font embeddability" property is set to "Restricted".

    I guess I'll have to parse the HTML and change all the values inside "font family" tags into a non-restricted one.