Search code examples
c#.net-6.0itext7

iText7 RegexBasedLocationExtractionStrategy How to get the fontname and fontsize of the found text


I tried text replacement with iText7 on C#. I can only get the content and rectangle of the search text using RegexBasedLocationExtractionStrategy, and I want to get both the font and size of the text. Any suggestions? Thank you.


Solution

  • You can implement ITextExtractionStrategy to do this (there are other methods you'll need to implement. Just keeping it brief here).

    class CustomTextExtractionStrategy : ITextExtractionStrategy
    {
        public void EventOccurred(IEventData data, EventType type)
        {
            if (type is EventType.RENDER_TEXT)
            {
                if (((TextRenderInfo)data).GetText().Equals("here I can match the text"))
                {
                    data.GetGraphicsState().GetFont();
                    data.GetGraphicsState().GetFontSize();
                }
    
            }
        }
    }
    

    GetFont() will contain PdfFont, and GetFontSize()... The font size (well, duh).

    You can also use the TextRenderInfo bit to do your pattern matching.

    And this is how you'd register this custom Extracter (one of many ways. Don't forget to iterate over all pages, and whatnot):

    CustomTextExtractionStrategy customTextExtractionStrategy = new CustomTextExtractionStrategy();
    PdfDocument pdfDocument = new PdfDocument(new PdfReader(DEST));
    PdfCanvasProcessor parser = new PdfCanvasProcessor(customTextExtractionStrategy);
    parser.ProcessPageContent(pdfDocument.GetFirstPage());
    

    I should add that I am not a C# developer, so sorry if this looks ugly/there are more elegant ways to use the language.