Search code examples
opencvimage-processingocrpython-tesseract

how can I improve digit OCR accuracy with opencv and pytesseract


thanks so much for your time in advance. i tried the following code to grab digits from the attached image but the results were so bad. I would really appreciate some suggestions on how to preprocess the image so i can get better results. does the red background in the img makes it difficult to get result?

Image with digits to OCR:

image with digits to OCR

#import needed modules

import cv2
import pytesseract
from PIL import Image
import numpy as np

def thin_font(pic):
    pic = cv2.bitwise_not(pic)
    kernel = np.ones((1,1),np.uint8)
    pic = cv2.erode(pic, kernel, iterations=1)
    pic = cv2.bitwise_not(pic)
    return (pic)

imgFile = "c:/test1.jpg"

img = cv2.imread(imgFile)

#img upscaling----------------------

width = int(img.shape[1]*1.4)
height = int(img.shape[0]*1.4)
dim = (width, height)

resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)

thinimg = thin_font(resized)
imggray = cv2.cvtColor(thinimg, cv2.COLOR_BGR2GRAY)

imginv = cv2.bitwise_not(imggray)

thresh, inputimg = cv2.threshold(imginv, 150, 230,cv2.THRESH_BINARY)



#-----------------------------------------------
text = pytesseract.image_to_string(inputimg, config="outputbase digits")

print(text)

Solution

  • With some trial and error I managed to get decent results, but not perfect...

    The main idea is "representing" PyTesseract one table cell at a time.

    The answer doesn't include automatic table separation using image processing.
    The solution assumes that the width and height of the cells are fixed and known from advance (some cropping and padding were needed).
    (In case you want to do it automatically, here is a nice code sample).


    Preprocessing that gave the best OCR results:

    • Convert the image (or each cell) to grayscale.
    • Invert polarity - make the text black on white (instead of white on black).
    • Resize the "cell" by a factor of x2 in each axis.

    Tesseract configuration that gave the best results:

    text = pytesseract.image_to_string(cell, config="-c tessedit"
                                       "_char_whitelist=' '0123456789-."
                                       " --psm 6")
    

    Code sample:

    import pytesseract
    import numpy as np
    
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # May need when using Windows
    
    imgFile = "test1.png"
    
    img = cv2.imread(imgFile)
    
    img = img[0:-10, 1:-4, :]  # Crop the relevant part
    img = np.pad(img, ((3, 0), (0, 0), (0, 0)), 'edge')  # Add some padding to the top (making constant cell height).
    
    for row in range(8):
        print()  # New line
        for col in range(7):
            x0 = col*80  # Assume cell width is 80 pixels
            y0 = row*19  # Assume cell height is 19 pixels
            x1 = x0 + 80
            y1 = y0 + 20
            cell = img[y0+1:y1-1, x0+1:x1, :]  # Crop the cell in position [col, row]
            cell = 255 - cv2.cvtColor(cell, cv2.COLOR_BGR2GRAY)  # Convert to grayscale and invert polarity
            cell = cv2.resize(cell, (cell.shape[1]*2, cell.shape[0]*2), interpolation=cv2.INTER_CUBIC)  # Resize up by a factor of x2 in each axis.
            text = pytesseract.image_to_string(cell, config="-c tessedit"
                                                            "_char_whitelist=' '0123456789-."
                                                            " --psm 6")
            print(text.rjust(11), end='', flush=True)  # Print the text without newline (add leading spaces).
            cv2.imshow('cell', cell)  # Show the cell as image
            cv2.waitKey()  # Wait for key pressing
    
    print()  # New line
    cv2.destroyAllWindows()
    

    Output:

      -2227    -410.59      11.11  -12673.94    -135.49    -106.01    -349.10
      -2629    -403.90       3.81  -15635.17    -243.68    -115.72     318.26
      -1791     404.17       8.60   -8068.60      44.42     -87.76   -1663.20
      -2920    -674.54       5.74  -11296.37    -146.38    -143.96     486.33
      -3110    -728.97       3.92  -11358.89    -173.37    -150.93     436.33
      -3283    -752.10     -12.20   -9683.32    -158.25    -151.55    -753.67
      -2412     498.37      10.56  -11971.43    -101.01    -119.15    -916.63
      -2583     446.77       7.01  -14523.37    -176.70    -120.52    -277.24
    

    Input image (as reference):
    enter image description here


    Issues:
    There is an issue with the minus sign, when the sign touches the digit.
    Example:
    enter image description here
    (In that case the minus sign is not identified).

    Suggested solution:
    Check if the background color is red or green, and if it's red, add a minus sign (if not exist).