Search code examples
c#-4.0itextcp1251

ITextSharp: parse html with cyrillic/international words


I try to parse html file and to generate pdf. I use code

document.Open();
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
IPipeline pipeline =
    new CssResolverPipeline(cssResolver,
        new HtmlPipeline(htmlContext,
                new PdfWriterPipeline(document, writer)));


XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser p = new XMLParser(true, worker, Encoding.Unicode);

p.Parse((TextReader)File.OpenText(@"Template.html"));
document.Close();

How can I define base font, If i'd like use cyrillic/international words?


Solution

  • You should register font

    string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
    FontFactory.Register(arialuniTff);
    

    and modifed page's body

    <body face='Arial' encoding='koi8-r' >
    ...
    </body >
    

    For somebody, who can read in russian, this article can be useful