Search code examples
.netinitializationtesseractemgucv

Tesseract 5 within EMGU exception when initialising with tesseract method


Using Tesseract within EMGU, a Tesseract object

Emgu.CV.OCR.Tesseract

is created. Then on initialisation Init(dataPath As String, language As String, mode As Emgu.CV.OCR.OcrEngineMode), three parameters are

  1. Path to the training data
  2. Language
  3. Engine mode

Of parameter 3, engine mode, the options in the enumeration are

 Emgu.CV.OCR.OcrEngineMode.TesseractOnly = 0
 Emgu.CV.OCR.OcrEngineMode.LstmOnly = 1
 Emgu.CV.OCR.OcrEngineMode.TesseractLstmCombined = 2
 Emgu.CV.OCR.OcrEngineMode.Default = 3

Options 0, 2 and 3 involve the tesseract method, and on initialisation, throw the exception

System.ArgumentException: 'Unable to create ocr model using Path 'tessdata-fast', language 'eng' and OcrEngineMode 'TesseractOnly'.'

Why is this the case, why can I only use the neural network model?


Solution

  • The legacy engine (Tesseract) components are present only in models from this location, marked main. Models marked fast and best have only LSTM (neural network) components, and will cause initialisation of the legacy engine to fail.