Search code examples
pythonopencvcomputer-visiontesseractpython-tesseract

Recognizing matrix from image


I have written algorithm that solves the pluszle game matrix. Input is numpy array.

Now I want to recognize the digits of matrix from screenshot.

There are different levels, this is hard one:

there are different levels, this is hard one

And this is easy one:

this is easy one

the output of recognition should be numpy array

array([[6, 2, 4, 2],
       [7, 8, 9, 7],
       [1, 2, 4, 4],
       [7, 2, 4, 0]])

I have tried to feed last image to tesseract

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

print(pytesseract.image_to_string(Image.open('C:/Users/79017/screen_plus.jpg')))

The output is unacceptable

LEVEL 4

(}00:03 M0
J] —.°—@—@©

I think that I should use contours from opencv, because the font is always the same. maybe I should save contours for every digit, than save every countour that exist on screenshot than somehow make matrix from coordinates of every digit-contour. But I have no idea how to do it.


Solution

  • 1- Binarize

    Tesseract needs you to binarize the image first. No need for contour or any convolution here. Just a threshold should do. Especially considering that you are trying to che... I mean win intelligently to a specific game. So I guess you are open to some ad-hoc adjustments.

    For example, (hard<240).any(axis=2) put in white (True) everything that is not white on the original image, and black the white parts.

    enter image description here

    Note that you don't get the sums (or whatever they are, I don't know what this game is) here. Which are on the contrary almost black areas

    But you can have them with another filter

    (hard>120).any(axis=2)
    

    enter image description here

    You could merge those filters, obviously

    (hard<240).any(axis=2) & (hard>120).any(axis=2)
    

    enter image description here

    But that may not be a good idea: after all, it gives you an opportunity to distinguish to different kind of data, why you may want to do.

    2- Restrict

    Secondly, you know you are looking for digits, so, restrict to digits. By adding config='digits' to your pytesseract args.

    pytesseract.image_to_string((hard>240).all(axis=2))
    # 'LEVEL10\nNOVEMBER 2022\n\n™\noe\nOs\nfoo)\nso\n‘|\noO\n\n9949 6 2 2 8\n\nN W\nN ©\nOo w\nVon\n+? ah ®)\nas\noOo\n©\n\n \n\x0c'
    
    pytesseract.image_to_string((hard>240).all(axis=2), config='digits')
    # '10\n2022\n\n99496228\n\n17\n-\n\n \n\x0c'
    

    3- Don't use image_to_string

    Use image_to_data preferably. It gives you bounding boxes of text.

    Or even image_to_boxes which give you digits one by one, with coordinates

    Because image_to_string is for when you have a good old linear text in the image. image_to_data or image_to_boxes assumes that text is distributed all around, and give you piece of text with position. image_to_string on such image may intervert what you would consider the logical order

    4- Select areas yourself

    Since it is an ad-hoc usage for a specific application, you know where the data are.

    For example, your main matrix seems to be in area

    hard[740:1512, 132:910]
    

    enter image description here

    See

    print(pytesseract.image_to_boxes((hard[740:1512, 132:910]<240).any(axis=2), config='digits'))
    

    Not only it avoids flooding you with irrelevant data. But also, tesseract performs better when called only with an image without other things than what you want to read.

    Seems to have almost all your digits here.

    5- Don't expect for miracles

    Tesseract is one of the best OCR. But OCR are not a sure thing...

    See what I get with this code (summarizing what I've said so far), printing in red digits detected by tesseract just next to where they were found in the real image.

    import cv2
    import matplotlib.pyplot as plt
    import numpy as np
    import pytesseract
    
    hard=cv2.imread("hard.jpg")
    hard=hard[740:1512, 132:910]
    bin=(hard<240).any(axis=2)
    boxes=[s.split(' ') for s in pytesseract.image_to_boxes(bin, config='digits').split('\n')[:-1]]
    out=hard.copy() # Just to avoid altering original image, in case we want to retry with other parameters
    H=len(hard)
    for b in boxes:
         cv2.putText(out, b[0], (30+int(b[1]), H-int(b[2])), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
    
    plt.imshow(cv2.cvtColor(out,cv2.COLOR_BGR2RGB))
    plt.show()
    

    enter image description here

    As you can see, result are fairly good. But there are 5 missing numbers. And one 3 was read as "3.".

    For this kind of ad-hoc reading of an app, I wouldn't even use tesseract. I am pretty sure that, with trial and errors, you can easily learn to extract each digits box your self (there are linearly spaced in both dimension).

    And then, inside each box, well there are only 9 possible values. Should be quite easy, on a generated image, to find some easy criterions, such as the number of white pixels, number of white pixels in top area, ..., that permits a very simple classification