Search code examples
c#pdfironpdf

IronPDF EAP doesn't interpret C# string as UTF-16


I'm attempting to convert a bit of HTML to a PDF document with IronPDF EAP 2021.6.3135. After creating a new ChromePdfRenderer, I call RenderHtmlAsPdfAsync on it, passing the HTML string as the only argument. The HTML is a single <div> with several nested <div>s, one of which contains CJK text. IronPDF appears to interpret that text as either ASCII or UTF-8; in any case, it renders it as nonsense. This works properly—without the workaround mentioned below—with the current release of IronPDF (2021.3.1).

Inserting a byte-order mark (\uFEFF) at the beginning of the string fixes the problem, but I shouldn't need to do that. Is there a new setting/option in the EAP branch's API that I've overlooked? Or is this a known issue that will get addressed before release?


Solution

  • Chrome encoding autodetection fails with very long html strings.

    It is recommended to include:

    <meta charset="utf-16"/>
    

    at the beginning of any HTML file which contains utf-16 characters. (This is a reasonable request because ultimately it is difficult to determine the desired decoding).

    Iron Software is reviewing the possibility of IronPDF automatically defaulting to utf-16 encoding if no other encoding is specified, to help alleviate these kinds of issues.