Search code examples
pythoncomputer-visionpython-tesseract

PyTesseract works great with this code, but not my code (with minimal differences)


I'm trying to use this tutorial to have PyTesseract OCR my desktop. It works when I run that script, as you can see by this image:

this image,

The code from the tutorial:

#Construct arg parser and parse arg's
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd")
        # '--image' refers to the path of the input image that will be OCR'd

ap.add_argument("-c", "--min-conf", type=int, default=0, help="min conf value to filter weak text detection")
        # sets a min conf to filter weak detections
args = vars(ap.parse_args())


#Load input image, convert from BGR to RGB ch ordering, and
# use Tesseract to localize each area of text in the input image
image = cv2.imread(args["image"] )
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)
    # 'image_to_data' detects and localizes text 


#Loop over each indiv text localizations
for i in range(0,  len(results["text"] )  ):
    #extract bounding box coordinates of the text region from the current result
    x = results["left"][i]
    y = results["top"][i]
    w = results["width"][i]
    h = results["height"][i]

    #extract OCR itself along with conf of text localztn
    text = results["text"][i]
    print(results["conf"][i])
    conf = int( results["conf"][i] )


#Filter out weak conf text localztns
    if conf > args["min_conf"]:

        #display conf and text to terminal
        print("Confidence: {}".format(conf) )
        print("Text: {}".format(text) )
        print("")

        #remove non-ASCII text so we can draw text on image using OpenCV, then draw bounding box around text with text itself
        text = "".join( [c if ord(c) < 128 else "" for c in text] ).strip()
        cv2.rectangle(image,  (x,y),  (x+w, y+h),  (0, 255, 0), 2 )
        cv2.putText(image, text, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)

    #Show output image
    cv2.imshow("Image", image)
    cv2.waitKey(0)      # makes it so that it'll wait for you to hit a key before it continues

but it doesn't work when I try to implement it into another project. Here's my code:

screenshotOfDesktop = pyautogui.screenshot('screenshotOfDesktop.png')

#have Tesseract read it
readDesktop_SAP = cv2.imread('screenshotOfDesktop.png')

#convert data to string
rgb = cv2.cvtColor(readDesktop_SAP, cv2.COLOR_BGR2RGB)

results = pytesseract.image_to_data(rgb, config='--psm 7', output_type=Output.DICT)
    # "config= '--psm 7' " makes it so that PyTesseract reads everything as a single line of text
print(results)

# Iterating through the list of results
for i in range(0,  len(results["text"] )  ):
    if "Description" not in results["text"]:

        print("Didn't find description on screen. Please check that the SAP 'find document' page is open on the screen. ")
        input('Press ENTER to exit now. ')
        exit()
    
    if "Description" in results["text"]:
        print("Found 'Description' on screen! ")

        # Gating by confidence
        conf = int(results["conf"][i])
        if conf < 0.2:
            print("Confidence is less than 0.7. Moving on. ")
            continue

        elif conf >= 0.2:
            # Getting the coordinates of the result
            Desc_x = results["left"][i]
            Desc_y = results["top"][i]
            Desc_w = results["width"][i]
            Desc_h = results["height"][i]
            # Printing everything
            print("The coordinates are: ")
            print(x, y, width, height)
            print(f"Confidence = {conf}") 
#  

Instead, my code only spits out this for the "results" list:

{'level': [1, 2, 3, 4, 5, 5], 'page_num': [1, 1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1, 1], 'par_num': [0, 0, 1, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1, 1], 'word_num': [0, 0, 0, 0, 1, 2], 'left': [0, 0, 0, 0, 0, 1451], 'top': [0, 4, 4, 4, 4, 145], 'width': [1920, 1727, 1912, 1727, 891, 276], 'height': [1080, 1061, 1070, 1061, 1061, 8], 'conf': ['-1', '-1', '-1', '-1', 11, 0], 'text': ['', '', '', '', 'fe', '~']}

Does anyone have any clue on why that might be? I understand I'm not using argparser like the writer is, but it should be the same result, no? I checked to make sure that it was looking at the right screenshot as well.

Relevant information:

  1. Tesseract v4.1.0.20190314
  2. Python 3.9.2

Solution

  • I failed to recognize that the tutorial code utilized a grayscale image before using PyTesseract to OCR. I implemented a grayscale and was able to find the text afterwards.