python opencv image-processing computer-vision ocr

I want to change Background of image to Black

I am trying to detect time and date from a CCTV Surveillance footage. I cropped image where date and time is displayed and passing it to OCR. Due to the back ground some images it is detecting correctly and in other images OCR algorithm detecting numbers and : symbol wrongly. The only difference I can see between correctly detected image and other images is background of Date and timestamp in the image.

Below are some images and

and below are outputs from each image

2023/06/30 20:19:29 - Detects correctly

20223/07/01 02 00 39 - Detected 20223 instead of 2023 and all " : " symbols are missed

2023/07701 07:42:01 - Time detected correctly and date should be 2023/07/01

2023/06/30 Z*ai :55 - Time detected completely wrong.

As date and time is always in white color can we change background to complete black so that it detects correctly in all images.


import pytesseract
import cv2
import easyocr
import numpy as np


class Test:
    def __init__(self):
        self.text = ''
        self.reader = easyocr.Reader(['en'], gpu=True)
    def extract_time_easyOCR(self,croppedimagepath):
            img = cv2.imread(croppedimagepath)
            grayimg = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            #cv2.imshow("original", grayimg)
            # preprocessed_image = cv2.threshold(grayimg, 127, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] 
            # cv2.imshow('frame', preprocessed_image); cv2.waitKey(0)
            # threshold = 0.10 * np.max(grayimg)
            # grayimg[grayimg <= threshold] = 0
            # cv2.imshow('frame', grayimg); cv2.waitKey(0)
            result = self.reader.readtext(grayimg)
            for (bbox, text, prob) in result:
                self.text+=text + ' '
            print("resultfromEasyOCR",self.text)
            self.text = ""
            return self.text
    def extract_time_Tessaract(self,croppedimagepath):
            print("Inside Tessaract")
            image = cv2.imread("videos\Cropped_Image.jpg")
            image_RGB = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            preprocessed_image = cv2.threshold(image_RGB, 127, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] 
            print("resultfromTessaract",pytesseract.image_to_string(preprocessed_image))

I tried to do both with Tessaract and Easy OCR just to compare results and use which performs better. But easy OCR is far better than Tessaract until now.

If my assumption is wrong please help me pre processing steps I can do to detect correctly.

I am new to Computer vision. Please bear with me if anything is wrong. Above is the program I am using.

Solution

Here is one way to make the top row black in Python/OpenCV

Input

import cv2
import numpy as np

# read the input
img = cv2.imread('text.jpg')
hh, ww = img.shape[:2]

# make black image that is one row
black = np.zeros((1,ww,3), dtype=np.uint8)

# put black image into input at top row
img2 = img.copy()
img2[0:1, 0:ww] = black

# save the result
cv2.imwrite('text2.jpg',img2)

Result: