I'm trying to make an OCR program in python, and I'm using pillow to make an image high contrast black and white, but when I try to use tesseract to extract the text, I get the following error output in terminal:
Traceback (most recent call last):
File "OCR.py", line 41, in <module>
print(pytesseract.image_to_string(img))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 122, in image_to_string
config=config)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site- packages/pytesseract/pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess .py", line 707, in __init__
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess .py", line 1333, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/tesseract'
from PIL import Image
import numpy as np
import pytesseract
sens = int(input("Sensitivity (0-255): "))
im = Image.open("book.jpg")
pixels = np.asarray(im)
width, height = im.size
px = pixels.mean(axis=2)
ppx = px.flatten()
for i in range(ppx.size):
if ppx[i] > sens:
ppx[i] = 255
else:
ppx[i] = 0
pixels = ppx.reshape(height, width)
img = Image.fromarray(np.uint8(pixels))
img.show()
img.save("images2.jpg")
print(pytesseract.image_to_string(img))
According to the README, you must install tesseract to use pytesseract.
On Ubuntu:
sudo apt install tesseract-ocr