Search code examples
javaocrtesseract

Windows Tesseract TESSDATA_PREFIX problem


i am making a OCR program with Tesseract, however it throws an exception as:

    Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

My tessdata folder and traineddata files are inside my root project folder, here is a reading part of my program:

public class textRecognizer {

    static Scanner scan = new Scanner(System.in);
    static String mrzText;
    static String tesseractData = "/TextRecog/tessdata";
    static Tesseract tesObj = new Tesseract();

    static {
        tesObj.setDatapath(tesseractData);
        }

    public static void main(String[] args) {
        
        float startTime = System.currentTimeMillis();
        
        try {
            mrzText = tesObj.doOCR(new File("textimage.png"));
            System.out.println("Text in file is: "+mrzText);
        }
        
        catch(TesseractException e) {
            e.printStackTrace();
        }
        System.out.println("Time taken by program is: "+String.valueOf(System.currentTimeMillis() - startTime));
    }
    
}

textimage.png is also in the project folder.

I tried:

Running C:\Users\Ege\eclipse-workspace\TextRecog>set TESSDATA_PREFIX = C:\Users\Ege\eclipse-workspace\TextRecog\tessdata in cmd.

Redownloading tesseract


Solution

  • Solved, the problem is directory of tesseractData string. Changed it to

    static String tesseractData = "tessdata";
    

    tessdata folder is in my project folder so there is no need to write full directory to that. Program is working fine now