Search code examples
pythonopencvimage-processingocrtesseract

How to process a binary image to align sparse letters in a row?


I am trying to use tesseract ocr to convert an image to text. The image always have three letters without rotation/skew, but randomly distributed in an 90x50 png file.

By just cleaning and converting to black/white, tesseract could not get the text in the image. After aligning them by hand in Paint, the ocr gives the exact match. I doesn't even need to be exactly aligned. What I want is some tips on how to automate this alignment of the characters in the image prior to sending it to tesseract.

I am using python with tesseract and opencv.

Original image: origional image

What I have done - turn black and white: What I have done - turn black and white

What I want to do - aligned by code: What I want to do - aligned by code


Solution

  • letters

    You can use the following code to achieve this output. Some of the constants may need to be changed to fit your needs:

    import cv2
    import numpy as np
    
    # Read the image (resize so it is easier to see)
    img = cv2.imread("/home/stephen/Desktop/letters.png",0)
    h,w = img.shape
    img = cv2.resize(img, (w*5,h*5))
    # Threshold the image and find the contours
    _, thresh = cv2.threshold(img, 123, 255, cv2.THRESH_BINARY_INV);
    contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
    
    # Create a white background iamge to paste the letters on
    bg = np.zeros((200,200), np.uint8)
    bg[:] = 255
    left = 5
    
    # Iterate through the contours
    for contour,h in zip(contours, hierarchy[0]):
        # Ignore inside parts (circle in a 'p' or 'b')
        if h[3] == -1:
            # Get the bounding rectangle
            x,y,w,h = cv2.boundingRect(contour)
            # Paste it onto the background
            bg[5:5+h,left:left+w] = img[y:y+h,x:x+w]
            left += (w + 5)
    cv2.imshow('thresh', bg)
    cv2.waitKey()