Search code examples
pythonopencvcomputer-visionpython-imaging-librarycaptcha

how to extract numbers from captcha image in python?


I want to extract numbers from captcha image, so I tried this code from this answer this answer:

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract
import cv2

file = 'sample.jpg'

img = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, None, fx=10, fy=10, interpolation=cv2.INTER_LINEAR)
img = cv2.medianBlur(img, 9)
th, img = cv2.threshold(img, 185, 255, cv2.THRESH_BINARY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4,8))
img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
cv2.imwrite("sample2.jpg", img)


file = 'sample2.jpg'
text = pytesseract.image_to_string(file)
print(''.join(x for x in text if x.isdigit()))

and it worked fine for this image:
enter image description here
outPut: 436359
But, when I tried it on this image:
enter image description here
It gave me nothing, outPut: .
How can I modify my code to get the numbers as a string from the second image?

EDIT:
I tried Matt's answer and it worked just fine for the image above. but it doesn't recognise numbers like (8,1) in image A, and number (7) in image B
image A image A

image B image B
How to fix that?


Solution

  • Often, getting OCR just right on an image like this has to do with the order and parameters of the transformations. For example, in the following code snippet, I first convert to grayscale, then erode the pixels, then dilate, then erode again. I use threshold to convert to binary (just blacks and whites) and then dilate and erode one more time. This for me produces the correct value of 859917 and should be reproducible.

    import cv2
    import numpy as np
    import pytesseract
    
    file = 'sample2.jpg'
    img = cv2.imread(file)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    ekernel = np.ones((1,2),np.uint8)
    eroded = cv2.erode(gray, ekernel, iterations = 1)
    dkernel = np.ones((2,3),np.uint8)
    dilated_once = cv2.dilate(eroded, dkernel, iterations = 1)
    ekernel = np.ones((2,2),np.uint8)
    dilated_twice = cv2.erode(dilated_once, ekernel, iterations = 1)
    th, threshed = cv2.threshold(dilated_twice, 200, 255, cv2.THRESH_BINARY)
    dkernel = np.ones((2,2),np.uint8)
    threshed_dilated = cv2.dilate(threshed, dkernel, iterations = 1)
    ekernel = np.ones((2,2),np.uint8)
    threshed_eroded = cv2.erode(threshed_dilated, ekernel, iterations = 1)
    text = pytesseract.image_to_string(threshed_eroded)
    print(''.join(x for x in text if x.isdigit()))