Search code examples
javalstmocrtesseract

Tesseract initialization with LSTM based model only


I'm trying to make an app that recognizes text from an image that is in hungarian. I found out that the hungarian traineddata file only works with LSTM based recognition. My code is:

    AssetHelper.Init(context);
    AssetHelper.extractAssets(context);
    TessBaseAPI tessBaseAPI = new TessBaseAPI();
    tessBaseAPI.init(AssetHelper.tessDataPath, "hun");
    tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_BLOCK);
    tessBaseAPI.setImage(AssetHelper.getImageBitmap(context));
    String data = tessBaseAPI.getUTF8Text();
    dataOutput.setText(data);
    tessBaseAPI.clear();
    tessBaseAPI.end();

And I'm using: com.rmtheis:tess-two:9.1.0

I cannot find any information on how exactly to set the engine to LSTM mode. I just get the error:

2024-01-23 08:02:06.221 8422-8422 Tesseract(native) hu.androidtest.ocrproject E Could not initialize Tesseract API with language=hun!

How can I make the engine go in LSTM mode or just how can I run the app with the hungarian data?

Edit: I tried all 3 versions of the trained data (normal, fast, best).


Solution

  • Finally I found out downloading the older version of the training data works.