Search code examples
javamavendirectorypom.xml

OCR tessdata directory is incorrect


I've been following this tutorial for trying to create an OCR and I've copy and pasted all of the necessary code and followed the steps but I keep receiving this error when I run OCRDemo.java:

Error opening data file ./eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages!

So I'm assuming the issue is that TESSDATA_PREFIX has the wrong directory. Currently it is "C:\CodeRepository\OCR\tessdata" and I got that directory and confirmed that directory by literally going into file explorer and copying and pasting it. But I keep getting this error message. I've also tried "OCR\tessdata", "tessdata" but none of them work. Help?

Here's my pom.xml code that has the TESSDATA_PREFIX:

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">


<modelVersion>4.0.0</modelVersion>
  <groupId>OCR</groupId>
  <artifactId>OCR</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <properties>
    <TESSDATA_PREFIX>C:\CodeRepository\OCR\tessdata</TESSDATA_PREFIX>
  </properties>
  <dependencies>
    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>4.3.1</version>
    </dependency>
  </dependencies>
</project>

Solution

  • From the given link, it looks like it points the readers to incompatible language data files. Try https://github.com/tesseract-ocr/tessdata_fast.