I am trying to recognize seven segment digital text from image using tess4J .
My input is here
I have made some normalization as follows
1 ] Image cropped .
2 ] Converted it into binary
I wish to remove the jagged edges of text from image .How can i accomplish that ?
I have tried different traineddata from GitHub. But nothing works as i wish .
How to create traineddata manually ? .
I am waiting for your great suggestions & lot of thanks in advance. . . .
You can try a combination of Sobel filters (to thin the edges) and Gaussian filters (to blur the image).
You didn't specify which API you are using for image manipulation in Java, and as I'm not familiar with Tess4J I will show what can be accomplished from Python (you can use your preferred library for image manipulation in Java, the process will be the same):
import scipy
import scipy.misc
import scipy.ndimage.filters
import numpy
def save_image(img_data, counter):
img_fn = "img_{}.jpg".format(counter)
scipy.misc.imsave(img_fn, img_data)
if __name__ == "__main__":
# This loads the second image of your post
img_0 = scipy.misc.imread("TqO53.jpg")
img_0 = scipy.average(img_0, -1)
#save_image(img_0, 0)
# Obtain edges
img_x = scipy.ndimage.filters.sobel(img_0, 0)
img_y = scipy.ndimage.filters.sobel(img_0, 1)
img_1 = numpy.hypot(img_x, img_y)
#save_image(img_1, 1)
# Remove edges from original image (i.e. thinning edges)
img_2 = img_0 - img_1
img_2[img_2 < 10] = 0
save_image(img_2, 2)
# Blur image if you want to get rid of the sketchy borders
img_3 = scipy.ndimage.gaussian_filter(img_2, sigma=1)
save_image(img_3, 3)
This will generate the following images:
You can try with both types of images to determine which gives good results with Tess4J, it is possible that you don't need to blur the image after thinning the edges, as the numbers can be recognized more easily.
If after that you want, you can try thinning the whole numbers until they are 1 pixel thick. Maybe that works good with Tess4J.