Search code examples
javapdfocrpdfbox

Detect if a PDF is created from a scanned document using OCR [pdfbox]


I would like to know if a PDF was created from a scanned document using OCR.

To make the text from the scanned document selectable, I guess the same text is written using a transparent color, a special font, ...

I'm using pdfbox and I looked at the font, the color, and many other properties and I didn't find anything special.


Solution

  • In my case the text rendering mode was set to "Neither fill nor stroke text".

    pdfbox code:

    getGraphicsState().getTextState().getRenderingMode() == PDTextState.RENDERING_MODE_NEITHER_FILL_NOR_STROKE_TEXT