Search code examples
c++airocrtesseract

Tesseract - change language file location


I am making an AIR project, which will need some OCR capabilities, so i decided to use tesseract (now i try to get it working on Windows).

My problem is, that can not change the location of the language file - it always tries to look in my Tesseract installation directory (program files (x86)\Tesseract-OCR\tessdata\mylang.traineddata)

Is there a way that i can configure Tesseract to look for this file where i specify? for example in the same folder as tesseract.exe. I dont want (or perhaps event cant) install an application with the AIR installer. I've tried it with the 3.0 version and the latest SVN version.

Thanks


Solution

  • I have solved the problem by rewriting the source code of Tesseract (im using SVN 597). As nguyenq said, Tesseract tries to look for the data at the path set by the TESSDATA_PREFIX environment variable. If this is not found, then it does some trickery i dont understand :) . So if anyone needs a portable version of Tesseract (that is not dependent on a Tesseract installation) edit mainblk.cpp around line 60, this is my version:

    // remove the stuff that Tesseract does to find the installation path
    /* if (!getenv("TESSDATA_PREFIX")) {
    #ifdef TESSDATA_PREFIX
    #define _STR(a) #a
    #define _XSTR(a) _STR(a)
        datadir = _XSTR(TESSDATA_PREFIX);
    #undef _XSTR
    #undef _STR
    #else
        if (argv0 != NULL) {
          if (getpath(argv0, dll_module_name, datadir) < 0)
    #ifdef __UNIX__
            CANTOPENFILE.error("main", ABORT, "%s to get path", argv0);
    #else
            NO_PATH.error("main", DBG, NULL);
    #endif
        } else {
          datadir = "./";
        }
    #endif
      } else {
        datadir = getenv("TESSDATA_PREFIX");
      }*/
      datadir = "./"; // look for config things in the same folder as the executable.
    

    Now you can pack things in the "tesseract executable location"\tessdata directory