Search code examples
pythonpython-3.xnumpytesseract

Can I get pytesseract command to work properly in pycharm which is throwing errors


I am defining a fucntion which is converting an image to grayscale (bit black white) after that I am passing it to:

text = pytesseract.image_to_string(Image.open(gray_scale_image))

and then I am print the text what I am receiving but it is throwing errors:

Traceback (most recent call last):
  File "C:\Users\HP\PycharmProjects\nayaproject\venv\lib\site-packages\PIL\Image.py", line 2613, in open
fp.seek(0)
AttributeError: 'numpy.ndarray' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/HP/PycharmProjects/nayaproject/new.py", line 17, in <module>
text = pytesseract.image_to_string(Image.open(g))
  File "C:\Users\HP\PycharmProjects\nayaproject\venv\lib\site-packages\PIL\Image.py", line 2615, in open
fp = io.BytesIO(fp.read())
AttributeError: 'numpy.ndarray' object has no attribute 'read'

And instead of Image.open(grayscale), when I use Image.fromarray(grayscale) i got these errors:

Traceback (most recent call last):
  File "C:\Users\HP\PycharmProjects\nayaproject\venv\lib\site-packages\pytesseract\pytesseract.py", line 170, in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
  File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
  File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)

FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/HP/PycharmProjects/nayaproject/new.py", line 17, in <module>
text = pytesseract.image_to_string(Image.fromarray(g))
  File "C:\Users\HP\PycharmProjects\nayaproject\venv\lib\site-packages\pytesseract\pytesseract.py", line 294, in image_to_string
return run_and_get_output(*args)
  File "C:\Users\HP\PycharmProjects\nayaproject\venv\lib\site-packages\pytesseract\pytesseract.py", line 202, in run_and_get_output
run_tesseract(**kwargs)
  File "C:\Users\HP\PycharmProjects\nayaproject\venv\lib\site-packages\pytesseract\pytesseract.py", line 172, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

I am working on PyCharm, and I've already installed Pillow, numpy, opencv-python, pip and pytesseract for this project.


Solution

  • Since I guess gray_scale_image is output from OpenCV and is therefore numpy array as error suggests

    AttributeError: 'numpy.ndarray' object has no attribute 'read'

    you need to transform array to PIL object. From my own experience, I suggest you to automaticly transform numpy array to np.uint8, because PIL works with 8bit and you usually dont have overview of what gets out of OpenCV algorithms.

    text = pytesseract.image_to_string(Image.fromarray(gray_scale_image.astype(np.uint8)))
    

    If the above mentioned doesnt work, you definitly dont pass Image array of any form. Try to type these to find character of arguemnt:

    print(type(gray_scale_image))
    print(gray_scale_image.shape)
    

    After this will solve your first problem, new one will occur of which you do not know yet. You need to add path to your pytesseract

    pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path
    

    Solution is to add your path at the beginning

    pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
    TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'