Search code examples
javapdfitextpdf-parsing

Find out the location or page where the Font was not embedded in PDf using Itext


I am using Itext library to manipulate my PDF.

I am using this example http://developers.itextpdf.com/examples/itext-action-second-edition/chapter-16#616-listusedfonts.java to find out the fonts which are not embedded in PDF.

Does the library provide any option to check where exactly was the font not embedded in the PDF?


Solution

  • The sample referenced by the OP only inspects the pages and the form xobjects referenced from them, and it outputs information on the fonts provided in the resources of these entities.

    If one needs to pinpoint where exactly which kind of font is used, one has to use a different mechanism, the parser package classes with a custom render listener. This listener then can act on text drawing operations while such a not embedded font is used.

    The parser framework

    To find out where some resource actually is used on a page, you have to parse the page content stream and check the PDF instructions therein.

    iText helps you in doing so by providing a parser framework which reads the content stream and pre-analyzes it. The results of this first analysis are forwarded to a render listener you provide.

    You use the parser framework like this:

    PdfReader reader = new PdfReader(SOURCE);
    for (int page = from; page <= to; page++)
    {
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        RenderListener renderListener = YOUR_RENDER_LISTENER_IMPLEMENTATION;
        parser.processContent(page, renderListener);
        // after the page has been processed, probably 
        // some render listener related post-processing
    }
    

    For e.g. text extraction, you usually use the render listener implementations LocationTextExtractionStrategy or SimpleTextExtractionStrategy (which come with iText) and after the page has been processed, you retrieve the String of text from the strategy it has extracted from the events from the page.

    The render listener to customize

    Render listeners in iText 5 have to implement the interface RenderListener:

    public interface RenderListener {
        /**
         * Called when a new text block is beginning (i.e. BT)
         */
        public void beginTextBlock();
    
        /**
         * Called when text should be rendered
         * @param renderInfo information specifying what to render
         */
        public void renderText(TextRenderInfo renderInfo);
    
        /**
         * Called when a text block has ended (i.e. ET)
         */
        public void endTextBlock();
    
        /**
         * Called when image should be rendered
         * @param renderInfo information specifying what to render
         */
        public void renderImage(ImageRenderInfo renderInfo);
    }
    

    or ExtRenderListener which declares some additional listener methods.

    A render listener for your task, i.e. a render listener to find where exactly a given font is used to draw text, only needs to implement renderText non-trivially, e.g. like this:

    public void renderText(TextRenderInfo renderInfo)
    {
        DocumentFont documentFont = renderInfo.getFont();
        PdfDictionary font = documentFont.getFontDictionary();
        // Check the font dictionary like in your example code
        if (font FULFILLS SOME CRITERIA)
        {
            // The text
            String text = renderInfo.getText();
            // is rendered on the current page on the base line
            LineSegment baseline = renderInfo.getBaseline();
            // using a font fulfilling the given criteria
            ...
        }
    }