Search code examples
ocrtesseracthindilanguage-model

unable to open Cube language model params for hindi Language in tesseract


Tesseract unable to read cube language model. tesseract 1.png output.txt -l hin After above command execution following error occur.

Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from /usr/share/tesseract-ocr/tessdata/hin.cube.lm
Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file tessedit.cpp, line 207
Segmentation fault

Where I get hin.cube.lm file and how to deal with that file?


Solution

  • I fixed this error by installing the correct versions of the below files:

    • hin.cube.bigrams
    • hin.cube.fold
    • hin.cube.lm
    • hin.cube.nn
    • hin.cube.params
    • hin.cube.word-freq
    • hin.tesseract_cube.nn

    Along with the correct versions of the Hindi AND English training data.

    All above files are available at: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-304305

    I put these files under: /usr/local/share/tessdata

    This is on CentOS 7.2