Search code examples
python-3.xocrtesseractpython-tesseractpytesser

How can I make pytesseract read slahed 0 correctly


I am trying to read the phone number on the image. Since the image is very clear, I didn't apply any preprocessing yet pytesseract fails to recognize 0 correctly sometimes. I tried to train on similar font but it gives the same result. An example is this image

My code is pretty straightforward:

image=Image.open('Fotolar/0.png')
custom_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(image,config=custom_config)

I get this result: '9543 684 9993'

I tried fine-tuning with my images but I couldn't do it because all tutorials were ubuntu based and I am not familiar with it. Do you have any suggestions?


Solution

  • I followed this tutorial https://www.youtube.com/watch?v=JPDeiGc2an8&t=444s and used files and instruction on this repo https://github.com/kevinbicycle/ocrd-train.

    Tutorial was pretty clear. If you want to fine-tune like me, at the and of tutorial, instead of typing "make training", add some of the variables like "START_MODEL".

    You can also use my slashedzero.traineddata if your problem was identical to mine https://github.com/yusufuyanik1/SlashedZeroOCR.