Search code examples
python-2.7windows-7configtesseract

Path error with Tesseract


I thought I'd got Tesseract to work on my Win 7 machine:

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'

tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'

myFile = r"D:\temp\OCR\rightness_of_rendering.tif"

print(pytesseract.image_to_string(Image.open(myFile)))

tesseract.exe is located in C:\Program Files (x86)\Tesseract-OCR\tesseract.exe

eng.traineddata is located in C:\Program Files (x86)\Tesseract-OCR\tessdata

The error I get is

D:\LearnPython>D:\LearnPython\ocr_test.py
Traceback (most recent call last):
  File "D:\LearnPython\ocr_test.py", line 14, in <module>
    print(pytesseract.image_to_string(Image.open(myFile)))
  File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 125, in
image_to_string
    raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, u'Error opening data file \\Program
Files (x86)\\Tesseract-OCR\\eng.traineddata')

D:\LearnPython>

Which is one directory up, so I'm a little confused as how to set that up so it'll work properly.


Solution

  • From pytesseract github page

    tessdata_dir_config = '--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
    # Example config: '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
    # It's important to add double quotes around the dir path.
    
    pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)
    

    Note that you need to provide config=tessdata_dir_config into your image_to_string call

    So if you're using eng data it would be

    print(pytesseract.image_to_string(Image.open(myFile), lang='eng', config=tessdata_dir_config))