Search code examples
pythonocrtesseractpyautoguipython-tesseract

OCR and pytesseract detecting numbers in an image


currentbid.png:

I am trying to detect the number in this image and it gives me letters or the wrong number.

This is my image i am trying to detect the number ive tried tons of stuff with greyscale and inversion using tesseract but nothing seems to work it keeps giving me letters like ADA or the wrong number like if the image said 98.7M it would give me 19 9947 )M and i think that the period is messing it up but im unable to remove it or change the font. How can I fix it or train it?

Here is my current code:

pyautogui.screenshot("bidpossible.png", region=(900, 310, 450, 60)) #bidpossible
originalImage = cv2.imread('bidpossible.png')


grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)

(_, blackAndWhiteImage) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)

custom_config = r'--psm 8'


text = pytesseract.image_to_string(blackAndWhiteImage, config=custom_config)
print('Extracted Text: ', text)

Solution

  • how about using filter that will try to get only light-blue color that is inside text boundary?

    
    grayImage[(grayImage<210)] = 255
    grayImage[(grayImage>210) & (grayImage<230)] = 0
    
    
    # test different models, imho models 6 or 7 work better
    custom_config = f'--psm 7' 
    text = pytesseract.image_to_string(grayImage, config=custom_config)
    print(f'Extracted Text: ', text) # 4.34m_