Search code examples
kerasdeep-learningocrtesseractpython-tesseract

Pytesseract or Keras OCR to extract text from image


I'm trying to extract text from images. Currently I'm getting empty string as output. Below is my code for pytesseract, although I'm open to Keras OCR also:-

from PIL import Image
import pytesseract

path = 'captcha.svg.png'
img = Image.open(path)
captchaText = pytesseract.image_to_string(img, lang='eng', config='--psm 6')

I wasn't sure how to work with svg image so I converted them to png. Below are a few sample image:-

SVG image converted to PNG

enter image description here

enter image description here

enter image description here

enter image description here

Edit 1 (2021-05-19): I'm able to convert svg to png using cairosvg. Still not able to read the captcha text

Edit 2 (2021-05-20): Keras OCR is also not returning anything for these images


Solution

  • The reason for keras-ocr not working or returning nothing is because of the grayscale image (as I found it worked otherwise). See below:

    from PIL import Image 
    
    a = Image.open('/content/gD7vA.png') # return none by keras-ocr, 
    a.mode, a.split() # mode 1 channel + transparent layer / alpha layer (LA)
    
    b = Image.open('/content/CYegU.png') # return result by keras-ocr
    b.mode, b.split() # mode RGB + transparent layer / alpha layer (RGBA)
    

    In the above, the a is the file you mention in your question; as It showed, it has to channel, e.g. grayscale and transparent layer. And b is the file I converted to RGB or RGBA. The transparent layer already included in your original file and I didn't remove it, but it seems useless to keep otherwise if needed. In short, to make your input work on keras-ocr, you can convert your files to RGB (or RGBA) first and save them on disk. And then pass them to ocr.

    # Using PIL to convert one mode to another 
    # and save on disk
    c = Image.open('/content/gD7vA.png').convert('RGBA')
    c.save(....png)
    c.mode, c.split()
    
    ('RGBA',
     (<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A410>,
      <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A590>,
      <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A810>,
      <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A110>))
    

    Full code

    import matplotlib.pyplot as plt
    
    # keras-ocr will automatically download pretrained
    # weights for the detector and recognizer.
    pipeline = keras_ocr.pipeline.Pipeline()
    
    # Get a set of three example images
    images = [
             keras_ocr.tools.read(url) for url in [
                '/content/CYegU.png', # mode: RGBA; Only RGB should work too!
                '/content/bw6Eq.png', # mode: RGBA; 
                '/content/jH2QS.png', # mode: RGBA
                '/content/xbADG.png'  # mode: RGBA
        ]
    ]
    
    # Each list of predictions in prediction_groups is a list of
    # (word, box) tuples.
    prediction_groups = pipeline.recognize(images)
    Looking for /root/.keras-ocr/craft_mlt_25k.h5
    Looking for /root/.keras-ocr/crnn_kurapan.h5
    
    prediction_groups
    [[('zum', array([[ 10.658852,  15.11916 ],
              [148.90204 ,  13.144257],
              [149.39563 ,  47.694347],
              [ 11.152428,  49.66925 ]], dtype=float32))],
     [('sresa', array([[  5.,  15.],
              [143.,  15.],
              [143.,  48.],
              [  5.,  48.]], dtype=float32))],
     [('sycw', array([[ 10.,  15.],
              [149.,  15.],
              [149.,  49.],
              [ 10.,  49.]], dtype=float32))],
     [('vdivize', array([[ 10.407883,  13.685192],
              [140.62648 ,  16.940662],
              [139.82323 ,  49.070583],
              [  9.604624,  45.815113]], dtype=float32))]]
    

    Display

    # Plot the predictions
    fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
    for ax, image, predictions in zip(axs, images, prediction_groups):
        keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)
    

    enter image description here