Search code examples
javapdfocrtesseracttess4j

How to extract text from PDF image


I wanted to extract data from PDF which has image, and the image is form where letter will be inside small boxes for example, name : t e s t, here each and every word will be inside square box.

I have tried tesseract OCR could not get the desired result.

I have tried commercial ABBYY worked but I wanted to use java based free API.

below is the example enter image description here


Solution

  • Nicomsoft OCR SDK which is a free SDK has extracted the text from my PDF and results are satisfactory

    it supports really large technologies, Now I am trying to integrate it into my application

    Link https://www.nicomsoft.com/