Search code examples
javatesseractimage-recognitiontess4jseven-segment-display

Seven Segment Digital Data Recognition using Tessseract / Java


I am trying to recognize seven segment digital text from image using tess4J .

My input is here

enter image description here

I have made some normalization as follows

1 ] Image cropped .

enter image description here

2 ] Converted it into binary

enter image description here

I wish to remove the jagged edges of text from image .How can i accomplish that ?

I have tried different traineddata from GitHub. But nothing works as i wish .

How to create traineddata manually ? .

I am waiting for your great suggestions & lot of thanks in advance. . . .


Solution

  • You can try a combination of Sobel filters (to thin the edges) and Gaussian filters (to blur the image).

    You didn't specify which API you are using for image manipulation in Java, and as I'm not familiar with Tess4J I will show what can be accomplished from Python (you can use your preferred library for image manipulation in Java, the process will be the same):

    import scipy
    import scipy.misc
    import scipy.ndimage.filters
    import numpy
    
    def save_image(img_data, counter):
        img_fn = "img_{}.jpg".format(counter)
        scipy.misc.imsave(img_fn, img_data)
    
    
    if __name__ == "__main__":
        # This loads the second image of your post
        img_0 = scipy.misc.imread("TqO53.jpg")
        img_0 = scipy.average(img_0, -1) 
        #save_image(img_0, 0)
    
        # Obtain edges
        img_x = scipy.ndimage.filters.sobel(img_0, 0)
        img_y = scipy.ndimage.filters.sobel(img_0, 1)
        img_1 = numpy.hypot(img_x, img_y)
        #save_image(img_1, 1)
    
        # Remove edges from original image (i.e. thinning edges)
        img_2 = img_0 - img_1
        img_2[img_2 < 10] = 0 
        save_image(img_2, 2)
    
        # Blur image if you want to get rid of the sketchy borders
        img_3 = scipy.ndimage.gaussian_filter(img_2, sigma=1)
        save_image(img_3, 3)
    

    This will generate the following images:

    img_2.jpg

    With edges thined

    img_3.jpg

    Blurred

    You can try with both types of images to determine which gives good results with Tess4J, it is possible that you don't need to blur the image after thinning the edges, as the numbers can be recognized more easily.

    If after that you want, you can try thinning the whole numbers until they are 1 pixel thick. Maybe that works good with Tess4J.