Search code examples
pythonocrtesseract

improve ocr accuracy from a bbox of a Text Detector


I'm using tesseract to extract text from an image, an image of a license plate that I got using a text detector

from PIL import Image
import pytesseract
import cv2

img= cv2.imread('text0.jpg')
print (pytesseract.image_to_string(th))

However, it doesn't give the exact text, are there any filters I can use to improve the quality of the image? Kindly review and give feedback.


Solution

  • U should make sure the text horizantal, and i hope this modificatons will help

    from PIL import Image
    import pytesseract
    import cv2
    
    img= cv2.imread('text0.jpg',0)
    h,w= img.shape
    img= cv2.resize(img, (w*2,h*2)) 
    retval2,th = cv2.threshold(img,35,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
    
    print (pytesseract.image_to_string(th))
    

    there are other approches u can try, like blurring and changing the contrast.