Search code examples
pythonpython-3.xocrtesseract

How to read black text on black background image through tesseract OCR?


I have black text on black background image and I want to read it through OCR. Unfortunately, OCR can not read it perfectly. The image look like this. enter image description here I want to convert RGBA value that less than (90, 90, 90, 255) to (255, 255, 255, 255) so it turn B & W. What's the code to convert it?


Solution

  • What you need to do is make the whole image black and white before letting tesseract do its job.

    Read image

    import cv2
    im_gray = cv2.imread('your_image_here', cv2.IMREAD_GRAYSCALE)
    

    Make it grayscale

    (thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
    

    "which determines the threshold automatically from the image using Otsu's method, or if you already know the threshold you can use:"

    thresh = 127
    im_bw = cv2.threshold(im_gray, thresh, 255, cv2.THRESH_BINARY)[1]
    

    Write to disk

    cv2.imwrite('bw_image.png', im_bw)
    

    Taken from here