Search code examples
androidmaventesseracttess4j

Android: Tesseract couldn't load any languages


Hi guys I am trying to run Tesseract and get the text from an image but I encounter the following error:

Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:477)
at com.sun.jna.Function.invoke(Function.java:411)
at com.sun.jna.Function.invoke(Function.java:323)
at com.sun.jna.Library$Handler.invoke(Library.java:236)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:436)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:291)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
at Crop_Image.main(Crop_Image.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

I am loading an image file jpg containing english text. This is how I try to load the file and then try to get the text from it:

 public static void main(String[] args){

    String result = "";

    File imageFile = new File("C:\\Users\\user\\Desktop\\Untitled.jpg");
    Tesseract instance = new Tesseract();

    try {
         result = instance.doOCR(imageFile);
         result.toString();

    } catch (Exception e) {
        e.printStackTrace();
        System.err.println(e.getMessage());
    }
}

Also I am also inside my project using Maven and here is my pom file:

<dependencies>

    <dependency>
        <groupId>nu.pattern</groupId>
        <artifactId>opencv</artifactId>
        <version>2.4.9-4</version>
    </dependency>

    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>3.1.0</version>
    </dependency>

</dependencies>

What could be the cause of this error?


Solution

  • I saw your code and there might be an issue in the way you initialize Tesseract. Now since you are using maven as nguyenq suggested you need to point exactly to the location of the library - tessdata so here is what you should do:

      public static String Image_To_Text(String image_path){
    
        String result = "";
    
        File imageFile = new File("your path to your image");
    
        Tesseract instance = Tesseract.getInstance();
        //In case you don't have your own tessdata, let it also be extracted for you
        File tessDataFolder = LoadLibs.extractTessResources("tessdata");
    
        //Set the tessdata path
        instance.setDatapath(tessDataFolder.getAbsolutePath());
    
        try {
             result = instance.doOCR(imageFile);
    
        } catch (Exception e) {
            e.printStackTrace();            
        }
    
        return result;
    }