Search code examples
pythonopencvocrtesseractcaptcha

Removing black background/black stray straight lines from a captcha in python


I am trying read text from this image] using Python with OpenCV.

enter image description here

However, black background in corners if this pic is messing with the text output and is giving wrong text.

I tried to used Adaptive Gaussian Thresholding in OpenCV using code:

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img=cv.imread(file_path,0)

img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)

th2 =cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
        cv.THRESH_BINARY,11,2)

**th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
        cv.THRESH_BINARY,11,2)**

titles = ['Original Image', 'Global Thresholding (v = 127)',
        'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']

images = [img, th1, th2, th3]

for i in range(4):
    plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])

plt.show()

The output of this code as AGT_result

How to extract the words only?


Solution

  • As an ad-hoc solution, we may use cv2.floodFill 4 times - one at each corner:

    img = cv.imread(file_path, 0)
    
    rows, cols = img.shape
    
    cv.floodFill(img, None, seedPoint=(0, 0), newVal=255, loDiff=1, upDiff=1)  # Fill the top left corner.
    cv.floodFill(img, None, seedPoint=(cols-1, 0), newVal=255, loDiff=1, upDiff=1)  # Fill the top right corner.
    cv.floodFill(img, None, seedPoint=(0, rows-1), newVal=255, loDiff=1, upDiff=1)  # Fill the bottop left corner.
    cv.floodFill(img, None, seedPoint=(cols-1, rows-1), newVal=255, loDiff=1, upDiff=1)  # Fill the bottom right corner.
    

    Result after cv.medianBlur:
    enter image description here