python ocr tesseract pyautogui python-tesseract

OCR and pytesseract detecting numbers in an image

currentbid.png:

I am trying to detect the number in this image and it gives me letters or the wrong number.

This is my image i am trying to detect the number ive tried tons of stuff with greyscale and inversion using tesseract but nothing seems to work it keeps giving me letters like ADA or the wrong number like if the image said 98.7M it would give me 19 9947 )M and i think that the period is messing it up but im unable to remove it or change the font. How can I fix it or train it?

Here is my current code:

pyautogui.screenshot("bidpossible.png", region=(900, 310, 450, 60)) #bidpossible
originalImage = cv2.imread('bidpossible.png')


grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)

(_, blackAndWhiteImage) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)

custom_config = r'--psm 8'


text = pytesseract.image_to_string(blackAndWhiteImage, config=custom_config)
print('Extracted Text: ', text)

Solution

how about using filter that will try to get only light-blue color that is inside text boundary?


grayImage[(grayImage<210)] = 255
grayImage[(grayImage>210) & (grayImage<230)] = 0


# test different models, imho models 6 or 7 work better
custom_config = f'--psm 7' 
text = pytesseract.image_to_string(grayImage, config=custom_config)
print(f'Extracted Text: ', text) # 4.34m_