Search code examples
pythonpython-3.xpytesser

Image to text python


I am using python 3.x and using the following code to convert image into text:

from PIL import Image
from pytesseract import image_to_string

image = Image.open('image.png', mode='r')
print(image_to_string(image))

I am getting the following error:

Traceback (most recent call last):
  File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module>
    print(image_to_string(image))
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
    config=config)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    stderr=subprocess.PIPE)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 1220, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Please note that I have put the image in the same directory where my python is present. Also It does not raise error on image = Image.open('image.png', mode='r') but it raises on the line print(image_to_string(image)).

Any idea what might be wrong here? Thanks


Solution

  • You have to have tesseract installed and accesible in your path.

    According to source, pytesseract is merely a wrapper for subprocess.Popen with tesseract binary as a binary to run. It does not perform any kind of OCR itself.

    Relevant part of sources:

    def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
        '''
        runs the command:
            `tesseract_cmd` `input_filename` `output_filename_base`
    
        returns the exit status of tesseract, as well as tesseract's stderr output
        '''
        command = [tesseract_cmd, input_filename, output_filename_base]
    
        if lang is not None:
            command += ['-l', lang]
    
        if boxes:
            command += ['batch.nochop', 'makebox']
    
        if config:
            command += shlex.split(config)
    
        proc = subprocess.Popen(command,
                stderr=subprocess.PIPE)
        return (proc.wait(), proc.stderr.read())
    

    Quoting another part of source:

    # CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
    tesseract_cmd = 'tesseract'
    

    So quick way of changing tesseract path would be:

    import pytesseract
    pytesseract.tesseract_cmd = "/absolute/path/to/tesseract"  # this should be done only once 
    pytesseract.image_to_string(img)