Search code examples
pythonopencvocrpython-tesseract

Remove stamp from bill python


Any ideas on how to remove the stamp from this bill prior to OCR processing? enter image description here


Solution

  • Here is one way to do that in Python/OpenCV.

    • Read input
    • Threshold on yellow
    • Dilate to fill out rectangle
    • Get largest contour
    • Draw a white filled contour on the input image
    • Save the results

    Input:

    enter image description here

    import cv2
    import numpy as np
    
    # read image
    img = cv2.imread('form_with_label.jpg')
    
    # threshold on yellow
    lower=(0,200,200)
    upper=(100,255,255)
    thresh = cv2.inRange(img, lower, upper)
    
    # apply dilate morphology
    kernel = np.ones((9,9), np.uint8)
    mask = cv2.morphologyEx(thresh, cv2.MORPH_DILATE, kernel)
    
    # get largest contour
    contours = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = contours[0] if len(contours) == 2 else contours[1]
    big_contour = max(contours, key=cv2.contourArea)
    x,y,w,h = cv2.boundingRect(big_contour)
    
    # draw filled white contour on input 
    result = img.copy()
    cv2.drawContours(result,[big_contour],0,(255,255,255),-1)
    
    # save cropped image
    cv2.imwrite('form_with_label_thresh.png',thresh)
    cv2.imwrite('form_with_label_mask.png',mask)
    cv2.imwrite('form_with_label_removed.png',result)
    
    # show the images
    cv2.imshow("THRESH", thresh)
    cv2.imshow("MASK", mask)
    cv2.imshow("RESULT", result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Thresholded Image:

    enter image description here

    Morphology Dilated Image:

    enter image description here

    Result:

    enter image description here