python image-processing captcha image-segmentation scikit-image

Segmenting letters in a Captcha image

I've written this algorithm in Python for reading CAPTCHAs using scikit-image:

from skimage.color import rgb2gray
from skimage import io

def process(self, image):
    """
    Processes a CAPTCHA by removing noise

    Args:
        image (str): The file path of the image to process
    """

    input = io.imread(image)
    histogram = {}

    for x in range(input.shape[0]):
        for y in range(input.shape[1]):
            pixel = input[x, y]
            hex = '%02x%02x%02x' % (pixel[0], pixel[1], pixel[2])

            if hex in histogram:
                histogram[hex] += 1
            else:
                histogram[hex] = 1

    histogram = sorted(histogram, key = histogram.get, reverse=True)
    threshold = len(histogram) * 0.015

    for x in range(input.shape[0]):
        for y in range(input.shape[1]):
            pixel = input[x, y]
            hex = '%02x%02x%02x' % (pixel[0], pixel[1], pixel[2])
            index = histogram.index(hex)

            if index < 3 or index > threshold:
                input[x, y] = [255, 255, 255, 255]

    input = rgb2gray(~input)
    io.imsave(image, input)

Before:

After:

It works fairly well and I get decent results after running it through Google's Tesseract OCR, but I want to make it better. I think that straightening the letters would yield a much better result. My question is how do I do that?

I understand I need to box the letters somehow, like so:

Then, for each character, rotate it some number of degrees based on a vertical or horizontal line.

My initial thought was to identify the center of a character (possibly by finding clusters of most used colors in the histogram) and then expanding a box until it found black, but again, I'm not so sure how to go about doing that.

What are some common practices used in image segmentation to achieve this result?

Edit:

In the end, further refining the color filters and limiting Tesseract to only characters yielded a nearly 100% accurate result without any deskewing.

Solution

Operation you want to do is technically in computer vision known as deskewing of the objects, for this you have to apply a geometric transformation on the objects, i have a snippet of the code to do apply deskewing on objects (binary). here is the code(uses opencv library):

def deskew(image, width):
    (h, w) = image.shape[:2]
    moments = cv2.moments(image)
    skew = moments["mu11"] / moments["mu02"]
    M = np.float32([[1, skew, -0.5 * w * skew],[0, 1, 0]])
    image = cv2.warpAffine(image, M, (w, h), flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR) 
    return image