I am trying to use the run_tesseract function to get an hocr output for extracting text from an image for Bank receipt images.However I am getting the above error message. I have installed Tesseract-OCR on my laptop, and have also added its path to my System Path variable.I have a windows 10 64 bit operating system,
I have tried uninstalling and reinstalling it also but to no avail.
import glob
import pytesseract
from PIL import Image
img_files=glob.glob('./NACH/*.jpg')
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract OCR\\tesseract.exe'
#im=Image.open(img_files[0])
#im.load()
pytesseract.run_tesseract(img_files[0],'output',lang='eng',config='hocr')
I get the following complete Error Message:
AttributeError Traceback (most recent call last) in
4 im=Image.open(img_files[0])
5 im.load()
----> 6 pytesseract.run_tesseract(img_files[0],'output',lang='eng',config='hocr')
7 #text = pytesseract.image_to_string(im)
8 #if os.path.isfile('output.html'):AttributeError: module 'pytesseract' has no attribute 'run_tesseract'
Replace pytesseract.run_tesseract()
with pytesseract.pytesseract.run_tesseract()
.
Credit Nithin in the comments. Adding this as an answer to close it out.