Search code examples
pythontesseractpython-tesseractimagedecoder

Image-to-Text Decoder


I would like to decode the following image into text : enter image description here

I already tried to use Tesseract OCR for my purpose but I've not been lucky so far.

Here's my code :

import pytesseract
import sys
import argparse

try:
    import Image
except ImportError:
    from PIL import Image
from subprocess import check_output


def resolve(path):
    check_output(['C:\Program Files\ImageMagick-7.0.9-Q16\convert.exe', path, '-resample', '600', path])
    return pytesseract.image_to_string(Image.open(path))

if __name__=="__main__":
    argparser = argparse.ArgumentParser()
    argparser.add_argument('path',help = 'image path at OCR')
    args = argparser.parse_args()
    path = args.path
    print('Resolving the image...')
    captcha_text = resolve(path)
    print('Result: ',captcha_text)`

Here's the output of my program :

C:\Users\Foussy\PycharmProjects\03_Imagedecoder>python main.py C:\Users\Foussy\Pictures\4570502--437826.jpeg
Resolving the image...
Result: 

It seems my program is unable to decode the picture. I tried to decode images with more "obvious" text and it did it well. I also tried several other examples of this type of captcha without success. What do you recommend me to do ?

The thing is, in the end, I would like to write a program that decodes images like this automatically, so unless there's reliable way to modify the images automatically in a way that makes Tesseract compatible with, I don't see any other way to solve this problem. If someone knows a certain library or something... Would be helpful.


Solution

  • This python library might help: https://pypi.org/project/captcha-solver/

    Example:

    from captcha_solver import CaptchaSolver
    
    solver = CaptchaSolver('twocaptcha', api_key='2captcha.com API HERE')
    raw_data = open('captcha.png', 'rb').read()
    print(solver.solve_captcha(raw_data))