Search code examples
pythonocrgoogle-colaboratory

How to extract text from image using pytesseract in colab?


I am getting this error when I try to use pytesseract in colab.

I am not sure how to fix this problem. I also install with pip install tesseract. But it doesn't work.

Does anyone know how to solve this issue? Or do you have any other python library OCR?

FileNotFoundError: [Errno 2] No such file or directory: 'tesseract': 'tesseract'

During handling of the above exception, another exception occurred:

TesseractNotFoundError                    Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    257         if e.errno != ENOENT:
    258             raise e
--> 259         raise TesseractNotFoundError()
    260 
    261     with timeout_manager(proc, timeout) as error_string:

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

Here is my code. I am trying to detect the number.

import pytesseract
roi = img[ymin:ymax, xmin:xmax]
text = pytesseract.image_to_string(roi, lang='eng')


Solution

  • This code will work in colab in-case pytesseract is not installed.

    !sudo apt install tesseract-ocr
    !pip install pytesseract
    import pytesseract
    from PIL import Image
    text = pytesseract.image_to_string(Image.open('/path'))
    print(text)