Search code examples
pythonpython-tesseract

Image Preprocessing to extract 2D number list


I've been tring to make a puzzle solving program. The game is 'fruit box' and you can play it through the link below.

https://en.gamesaien.com/game/fruit_box/

To do that, I have to extract numbers from game screen

fruit box game screen shot

I found 'pytesseract' which is able to identify characters from image, and almost finish extracting with using it. but the result value wasn't satisfied for me.

  1. threshold

At first, I used threshold function. I had to erase most of it because the background was the same white color as the numbers I was aiming for. The code and image are like this.

import pytesseract
import os
import cv2

image = os.getcwd() + '\\appletest.png'
img=cv2.imread(image)
grayImage = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,img_binary = cv2.threshold(grayImage, 246, 255, cv2.THRESH_BINARY)
text = pytesseract.image_to_string(img_binary, config='--psm 6')
# text = pytesseract.image_to_string(img_binary, config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789 ')
print(text)
cv2.imshow('Image', img_binary)
cv2.waitKey(0)
cv2.destroyAllWindows()

threshold result

The 'image_to_string' function returns numbers like this

41233366429816415
412567594457956471
3572263437133946
68241491629765459
73278354155567666
7796565142328726
15349752855757571
31221174825264255
83517514412317216
1957899195693134

It almost same! but there are some wrong number.(for example, at second line, 412567594457956471 should be just 41256759445796471)

So I had to find other way.

  1. inrange, floodFill

This tring is simple. Recognizing apples first, floodfill back ground second. the code and result is below.

import pytesseract
import os
import cv2
import numpy as np

image = os.getcwd() + '\\appletest.png'

img=cv2.imread(image)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
#find apple color
dst1 = cv2.inRange(hsv, (0, 100, 20), (10, 255, 255))
rows, cols = dst1.shape[:2]
mask = np.zeros((rows+2, cols+2), np.uint8)
loDiff, upDiff = (10,10,10), (10,10,10)
retval = cv2.floodFill(dst1, mask, (1,1), (255,255,255), loDiff, upDiff)
text = pytesseract.image_to_string(dst1, config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789 ')
# text = pytesseract.image_to_string(img_gradient, config='--psm 6')
print(text)
cv2.imshow('Image', dst1)
cv2.waitKey(0)
cv2.destroyAllWindows()

floodFill result

the result is this.

412333664298164215
412567594457964721
3957722634237619946
68241491629765458
732783542195567666
779685651412328726
15349752855757571
3912214174825264255
835175144121313217281216
15179191956988322134

But there were still wrong numbers added.

I guess it comes from quality of number(or image), so I implemented many preprocessing functions(sharpening, Erosion, Dilation, blur) but couldn't see perfect correct number list.

I don't know what should do more from here. Can you advise me to solve this situation?


Solution

  • My guess is that page segmentation mode 6 expects a true "block" of text and gets a bit nervous when seeing so much whitespace, so it decides to hallucinate a bit.

    Let's give it a hand by removing the whitespace and leave no more room for hallucinations:

    enter image description here

    # [your code up to flood fill]
    
    # let the letters bleed out a bit to extract
    # the whole character with some padding
    blurred = cv2.blur(dst1,(5,5))
    # crop out the white space
    text_space = blurred.mean(axis=0) != 255
    dst1 = dst1[:,text_space]
    
    cfg = '--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789 '
    text = pytesseract.image_to_string(dst1, config=cfg)
    print(text)
    
    # 41233366429816415
    # 41256759445796471
    # 35772263437619946
    # 68241491629765459
    # 73278354195567666
    # 77968565141328726
    # 15349752855757571
    # 31221174825264255
    # 83517514411317116
    # 15179191956983134