Search code examples
pythonjupyter-notebooktesseractpython-tesseract

Trouble getting tesseract to work on python


I'm having some trouble when I try to run a code using tesseract on jupyter notebook or on pycharm. I suspect is a problem with the installation on Windows 7 but I'm not sure what am I doing wrong.

So I've tried many different thing, from pip install tesseract and pytesseract to install tesseract OCR (at first I've thought is was just a library that's why I've messed up the order) following this: https://github.com/tesseract-ocr/tesseract/wiki I've actually downloaded this Cygwin and MSYS2 although I've saw some youtube videos in which they didn't install those. And I even path the right address on my system.

Ok so I'm using a simple code just exemplify:

from PIL import Image                                                           
import pytesseract

img = Image.open("teste.png")
print(img)
text = pytesseract.image_to_string(img)                              
print ('Image text:', text)

The error message is pretty big but I think it resume itself at this line:

TesseractNotFoundError: C:\Program Files\Tesseract-OCR is not installed or it's not in your path

Problem is that I did path it at: Environment variables - path - edit - %SystemRoot%\system32;%SystemRoot%; %SystemRoot%\System32\Wbem; %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\; C:\Program Files\Tesseract-OCR

And I know is installed because I can run it from cmd...


Solution

  • I always have trouble with pytesseract in Windows unless I tell it where the executable is:

    pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'