I am opening an image doing a morphologic transformation and saving it. However, there is visibly no different between the images (even if you zoom in to the pixels). Image links are below. One of them parses correctly and the other parses incorrectly.
Here's the kicker. If I open the image that isn't parsing correctly in MS Paint, do absolutely nothing, and then click save, it will magically start parsing correctly.
Can anyone provide an explanation to this?
Here is my code
img = cv2.imread(IMAGE, 1)
imgray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1,40), np.uint8)
morphed = cv2.morphologyEx(imgray, cv2.MORPH_CLOSE, kernel)
dst = cv2.add(imgray, (255-morphed))
cv2.imwrite("out.png", dst)
Image parsed as "52.983.842.":
Image incorrectly parsed as "522.983.8422.":
The two images differ indeed.
If you shove them into GIMP, and put the layer overlay mode to Subtract
, you get this:
After the last 2
, the difference seems to contain some artifact, which Tesseract thinks is another digit.
Saving the result using Paint might recode the output.
Consider that your pictures are JPG, which are lossy-compressed. There are several ways to make the compression tables, and you'll get different artifacts depending on it. It just seems that this current case, Tesseract picked up the noise.
And also note that JPG and text don't go well with each other. You should consider using lossless formats, like PNG.