Search code examples
pythonopencvscikit-imagepython-tesseract

I'm having problems recognizing text from a picture, python


I was given a school project for recognizing various kinds of CAPTCHA, and I had some difficulties with its implementation.

Images of this type will be fed into input enter image description here,enter image description here,enter image description here.

I handle them with the following code:

import cv2 
import pytesseract 

# load image 
fname = 'picture.png' 
im = cv2.imread(fname,cv2.COLOR_RGB2GRAY) 

pytesseract.pytesseract.tesseract_cmd = r'C:\Tesseract-OCR\tesseract.exe'

im = im[0:90, 35:150]

im = cv2.blur(im,(3,3)) 

im = cv2.threshold(im, 223 , 250, cv2.THRESH_BINARY) 
im = im[1] 

cv2.imshow('',im) 
cv2.waitKey(0) 

After all processing, the image looks like this:enter image description here And at this point, I have a problem, how can I modify the image to good readability by the computer, so that instead of the wrong TAREQ. he would display the 7TXB6Q

I am trying to display text from an image with the pytesseract library as follows

data = pytesseract.image_to_string(im, lang='eng', config='--psm 6 --oem 3 -c tessedit_char_whitelist= ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
print(data)

I am writing here hoping to get valuable advice (perhaps you know the most suitable way to get text from a picture or process the image pinned above). Peace for everyone)


More images

enter image description here enter image description here enter image description here enter image description here


Solution

  • You can try finding countours and eliminating those which have small areas. This preprocessing operation should increase the success of OCR result.

    Before: before

    import cv2 as cv
    import numpy as np
    
    # your thresholded image im
    bw = cv.imread('bw.png', cv.IMREAD_GRAYSCALE)
    
    _, cnts, _ = cv.findContours(bw, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
    # remove the largest contour which is background
    cnts = np.array(cnts[1:], dtype=object)
    
    areas = np.array(list(map(cv.contourArea, cnts)))
    
    thr = 35
    thr_cnts = cnts[areas > thr]
    
    disp_img = 255 * np.ones(bw.shape, dtype=np.uint8)
    disp_img = cv.drawContours(disp_img, thr_cnts, -1, (0, 0, 0), cv.FILLED)
    disp_img = cv.bitwise_or(disp_img, bw)
    
    cv.imshow('result', disp_img)
    cv.waitKey()
    cv.destroyAllWindows()
    
    cv.imwrite('result.png', disp_img)
    

    Result: result


    Edit: It seems that merging the two codes did not give the same result. This is the full code from the beginning to the end.

    Input: CAPTCHA

    import cv2 as cv
    import numpy as np
    
    # load image 
    fname = 'im.png'
    im = cv.imread(fname, cv.IMREAD_GRAYSCALE)
    
    # crop
    im = im[0:90, 35:150]
    
    # blurring is essential for denoising
    im = cv.blur(im, (3,3))
    
    thr = 219
    # the binary threshold value is very important
    # using 220 instead of 219 causes loss of a letter
    # because it touches to the bottom edge and gets involved in the background
    _, im = cv.threshold(im, thr, 255, cv.THRESH_BINARY)
    
    cv.imshow('', im)
    cv.waitKey(0)
    

    Thresholded: threshold

    # binary image
    bw = np.copy(im)
    
    # find contours and corresponding areas
    _, cnts, _ = cv.findContours(bw, cv.RETR_LIST, cv.CHAIN_APPROX_NONE)
    cnts = np.array(cnts, dtype=object)
    areas = np.array(list(map(cv.contourArea, cnts)))
    
    thr = 35
    # eliminate contours that are smaller than threshold
    # also remove the largest contour which is background
    thr_cnts = cnts[np.logical_and(areas > thr, areas != np.max(areas))]
    
    # draw the remaining contours
    disp_img = 255 * np.ones(bw.shape, dtype=np.uint8)
    disp_img = cv.drawContours(disp_img, thr_cnts, -1, (0, 0, 0), cv.FILLED)
    disp_img = cv.bitwise_or(disp_img, bw)
    
    cv.imshow('', disp_img)
    cv.waitKey()
    cv.destroyAllWindows()
    

    Result: result