Search code examples
pythonocrtesseractpython-tesseract

Tesseract OCR - specify pattern


I'm trying to perform OCR using Tesseract (version 3.04.00). All my images have the same pattern (digit dot digit digit, ie. a decimal with 2 digits precision). I tried using the --user-patterns option, but I can't have it to work.

What I did:

  • create a file patterns.txt with \d.\d\d on first line
  • use option --user-patterns patterns.txt

But I get the following error:

pytesseract.pytesseract.TesseractError: (1, "Tesseract Open Source OCR Engine v3.04.00 with Leptonica read_params_file: Can't open 1 read_params_file: Can't open user-patterns read_params_file: parameter not found: \\d.\\d\\d")

How can I specify my pattern to Tesseract ? Is this even the right approach ? Thanks in advance for help or advices, I don't find much doc on Tesseract.

EDIT: add Python code

img = cv2.imread("path/to/image", cv2.IMREAD_GRAYSCALE)
text = pytesseract.image_to_string(img, config="-psm 7 --user-patterns patterns.txt")
print(text)

Solution

  • Nevermind, I think Tesseract was overkill for my usecase.

    I took an image of each digit from 0 to 9, and picked the minimum mean square error with the image I want to predict. Got 100% accuracy on my test dataset.