Search code examples
ubuntuocrtesseractpython-tesseract

How to use the osd, equ.traineddata and other trained data files ( bengali, hindi) with pytesseract (Commands and where to put eq.traineddata)


I want the tesseract engine to use eq.traineddata to work on some mathematics as well as Bengali , Hindi texts. When I go to /usr/share/tesseract-ocr/4.00/tessdata, I see only a bunch of *.traineddata files. Checking the official documentation, I find the links to these data files. I have downloaded osd.traineddata and all other files given in tessdata link at the github.

Now What do I have to do?? Where do I have to put these files and then which command will enable these languages?

I am using Ubuntu 18 and Conda environment.


Solution

  • You can copy your *.traineddata files to /usr/share/tesseract-ocr/4.00/tessdata. During running tesseract you can pass the traineddata using -l param.

    E.g) tesseract inputpath output -l osd