I'm trying to extract text from images. Currently I'm getting empty string as output. Below is my code for pytesseract, although I'm open to Keras OCR also:-
from PIL import Image
import pytesseract
path = 'captcha.svg.png'
img = Image.open(path)
captchaText = pytesseract.image_to_string(img, lang='eng', config='--psm 6')
I wasn't sure how to work with svg image so I converted them to png. Below are a few sample image:-
Edit 1 (2021-05-19): I'm able to convert svg to png using cairosvg. Still not able to read the captcha text
Edit 2 (2021-05-20): Keras OCR is also not returning anything for these images
The reason for keras-ocr
not working or returning nothing is because of the grayscale image (as I found it worked otherwise). See below:
from PIL import Image
a = Image.open('/content/gD7vA.png') # return none by keras-ocr,
a.mode, a.split() # mode 1 channel + transparent layer / alpha layer (LA)
b = Image.open('/content/CYegU.png') # return result by keras-ocr
b.mode, b.split() # mode RGB + transparent layer / alpha layer (RGBA)
In the above, the a
is the file you mention in your question; as It showed, it has to channel, e.g. grayscale and transparent layer. And b
is the file I converted to RGB
. The transparent layer already included in your original file and I didn't remove it, but it seems useless to keep otherwise if needed. In short, to make your input work on keras-ocr
, you can convert your files to RGB
(or RGBA
) first and save them on disk. And then pass them to ocr.
# Using PIL to convert one mode to another
# and save on disk
c = Image.open('/content/gD7vA.png').convert('RGBA')
c.mode, c.split()
(<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A410>,
<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A590>,
<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A810>,
<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A110>))
Full code
import matplotlib.pyplot as plt
# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()
# Get a set of three example images
images = [
keras_ocr.tools.read(url) for url in [
'/content/CYegU.png', # mode: RGBA; Only RGB should work too!
'/content/bw6Eq.png', # mode: RGBA;
'/content/jH2QS.png', # mode: RGBA
'/content/xbADG.png' # mode: RGBA
# Each list of predictions in prediction_groups is a list of
# (word, box) tuples.
prediction_groups = pipeline.recognize(images)
Looking for /root/.keras-ocr/craft_mlt_25k.h5
Looking for /root/.keras-ocr/crnn_kurapan.h5
[[('zum', array([[ 10.658852, 15.11916 ],
[148.90204 , 13.144257],
[149.39563 , 47.694347],
[ 11.152428, 49.66925 ]], dtype=float32))],
[('sresa', array([[ 5., 15.],
[143., 15.],
[143., 48.],
[ 5., 48.]], dtype=float32))],
[('sycw', array([[ 10., 15.],
[149., 15.],
[149., 49.],
[ 10., 49.]], dtype=float32))],
[('vdivize', array([[ 10.407883, 13.685192],
[140.62648 , 16.940662],
[139.82323 , 49.070583],
[ 9.604624, 45.815113]], dtype=float32))]]
# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)