Search code examples
pythonopencvimage-processingocrtesseract

How can i extract numbers colored green in image using Python


There will be a picture, in the picture there are 3 numbers of indefinite length. The correct one is colored green. I want to print the green colored number.

example image
example image

my code

  img = cv2.imread("image.png")
  img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

  img = cv2.bitwise_not(img)
  _, binary = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY)
  txt = pytesseract.image_to_string(binary, config="--oem 3 --psm 4")
  print(txt)

Solution

  • When you import the image with cv2.COLOR_BGR2GRAY, you are telling it to delete all color information, and convert to grayscale.

    This code gets the image you posted, and converts to RGB with cv2.COLOR_BGR2RGB

    The image is an array with format image[row, column, [red,green,blue] ]

    Now you can extract the green color, and OCR it with Tesseract (you have to had Tesseract installed, and also the Python library pytesseract)

    import numpy as np
    import cv2
    import matplotlib.pyplot as plt
    
    
    def downloadImage(URL):
        """Downloads the image on the URL, and convers to cv2 BGR format"""
        from io import BytesIO
        from PIL import Image as PIL_Image
        import requests
    
        response = requests.get(URL)
        image = PIL_Image.open(BytesIO(response.content))
        return cv2.cvtColor(np.array(image), cv2.COLOR_BGR2RGB)
    
    
    URL = "https://i.sstatic.net/tYTZ8.png"
    
    # Read image
    colorImage = downloadImage(URL)
    
    RED, GREEN, BLUE = 0, 1, 2
    # Filter image with much of GREEN, and little of RED and BLUE
    greenImage = (
          (colorImage[:, :, RED] < 50)
        & (colorImage[:, :, GREEN] > 100)
        & (colorImage[:, :, BLUE] < 50)
    )
    
    plt.imshow(greenImage)
    plt.show()
    
    import pytesseract as pt
    
    txt = pt.image_to_string(greenImage, config="--oem 3 --psm 4")
    print(txt)
    >>>107018
    

    green filtered