java tesseract image-recognition tess4j seven-segment-display

Seven Segment Digital Data Recognition using Tessseract / Java

I am trying to recognize seven segment digital text from image using tess4J .

My input is here

enter image description here

I have made some normalization as follows

1 ] Image cropped .

enter image description here

2 ] Converted it into binary

enter image description here

I wish to remove the jagged edges of text from image .How can i accomplish that ?

I have tried different traineddata from GitHub. But nothing works as i wish .

How to create traineddata manually ? .

I am waiting for your great suggestions & lot of thanks in advance. . . .

Solution

You can try a combination of Sobel filters (to thin the edges) and Gaussian filters (to blur the image).

You didn't specify which API you are using for image manipulation in Java, and as I'm not familiar with Tess4J I will show what can be accomplished from Python (you can use your preferred library for image manipulation in Java, the process will be the same):

import scipy
import scipy.misc
import scipy.ndimage.filters
import numpy

def save_image(img_data, counter):
    img_fn = "img_{}.jpg".format(counter)
    scipy.misc.imsave(img_fn, img_data)


if __name__ == "__main__":
    # This loads the second image of your post
    img_0 = scipy.misc.imread("TqO53.jpg")
    img_0 = scipy.average(img_0, -1) 
    #save_image(img_0, 0)

    # Obtain edges
    img_x = scipy.ndimage.filters.sobel(img_0, 0)
    img_y = scipy.ndimage.filters.sobel(img_0, 1)
    img_1 = numpy.hypot(img_x, img_y)
    #save_image(img_1, 1)

    # Remove edges from original image (i.e. thinning edges)
    img_2 = img_0 - img_1
    img_2[img_2 < 10] = 0 
    save_image(img_2, 2)

    # Blur image if you want to get rid of the sketchy borders
    img_3 = scipy.ndimage.gaussian_filter(img_2, sigma=1)
    save_image(img_3, 3)

This will generate the following images:

img_2.jpg

With edges thined

img_3.jpg

Blurred

You can try with both types of images to determine which gives good results with Tess4J, it is possible that you don't need to blur the image after thinning the edges, as the numbers can be recognized more easily.

If after that you want, you can try thinning the whole numbers until they are 1 pixel thick. Maybe that works good with Tess4J.