Here is my image:
Here is my code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract"
img = cv2.imread(r"C:\Users\xxx\Desktop\ImageRecognition\number.jpg")
window_name = 'Number'
cv2.namedWindow(window_name, 0)
cv2.resizeWindow(window_name, 200, 100)
cv2.imshow(window_name, img)
cv2.waitKey(0)
cv2.destroyAllWindows()
configuration = '-l eng --psm 7'
text = pytesseract.image_to_string(img, config=configuration)
print(text)
After I run my code, I get a view as below:
However, I get the result from PyCharm print as:
I want to know why this happen, and how can I overcome this? I guess it is due to the noise. But I do not know what I should do.
To get satisfactory results from tesseract, you should pre-process the image. According to this guide by tesseract itself, there are a few things you could do to this image. For me, upsizing the image worked and gave the correct output. To resize, I used opencv.
scale_percent = 200
width = int(image.shape[1] * scale_percent / 100)
height = int(image.shape[0] * scale_percent / 100)
dim = (width, height)
img = cv2.resize(image, dim, interpolation=cv2.INTER_AREA)
configuration = '-l eng --psm 7'
text = pytesseract.image_to_string(img, config=configuration)
print(text)
This will give you the correct output