Search code examples
pythonocrtesseractpython-tesseract

What is the difference between Pytesseract and Tesserocr?


I'm using Python 3.6 in Windows 10 and have Pytesseract already installed but I found in a code Tesserocr which by the way I can't install. What is the difference?


Solution

  • pytesseract is only a binding for tesseract-ocr for Python. So, if you want to use tesseract-ocr in python code without using subprocess or os module for running command line tesseract-ocr commands, then you use pytesseract. But, in order to use it, you have to have a tesseract-ocr installed.

    You can think of it this way. You need a tesseract-ocr installed because it's the program that actually runs and does the OCR. But, if you want to run it from python code as a function, you install pytesseract package that enables you to do that. So when you run pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'), it calls the tesseract-ocr with the provided arguments. The results are the same as running tesseract test-european.jpg -l fra. So, you get the ability to call that from the code, but in the end, it still has to run the tesseract-ocr to do the actual OCR.