Detecting cardboard Box and Text on it using OpenCV

I want to count cardboard boxes and read a specific label which will only contain 3 words with white background on a conveyer belt using OpenCV and Python. Attached is the image I am using for experiments. The problem so far is that I am unable to detect the complete box due to noise and if I try to check w and h in x, y, w, h = cv2.boundingRect(cnt) then it simply filter out the text. in this case ABC is written on the box. Also the box have detected have spikes on both top and bottom, which I am not sure how to filter.

Below it the code I am using

import cv2

# reading image
image = cv2.imread('img002.jpg')

# convert the image to grayscale format
img_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# apply binary thresholding
ret, thresh = cv2.threshold(img_gray, 150, 255, cv2.THRESH_BINARY)

# visualize the binary image
cv2.imshow('Binary image', thresh)

# collectiong contours
contours,h = cv2.findContours(thresh, cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

# looping through contours
for cnt in contours:

    x, y, w, h = cv2.boundingRect(cnt)
        
    cv2.rectangle(image,(x,y),(x+w,y+h),(0,215,255),2)
           

cv2.imshow('img', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Also please suggest how to crop the text ABC and then apply an OCR on that to read the text.

Many Thanks.

EDIT 2: Many thanks for your answer and based upon your suggestion I changed the code so that it can check for boxes in a video. It worked liked a charm expect it only failed to identify one box for a long time. Below is my code and link to the video I have used. I have couple of questions around this as I am new to OpenCV, if you can find some time to answer.

import cv2
import numpy as np
from time import time as timer

 
def get_region(image):
    contours, hierarchy = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    c = max(contours, key = cv2.contourArea)
    black = np.zeros((image.shape[0], image.shape[1]), np.uint8)
    mask = cv2.drawContours(black,[c],0,255, -1)
    return mask

cap = cv2.VideoCapture("Resources/box.mp4")
ret, frame = cap.read()

fps = 60
fps /= 1000
framerate = timer()
elapsed = int()

while(1):

   start = timer()
   ret, frame = cap.read()

   # convert the image to grayscale format
   hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

   # Performing threshold on the hue channel `hsv[:,:,0]`
   thresh = cv2.threshold(hsv[:,:,0],127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]

   mask = get_region(thresh)
   masked_img = cv2.bitwise_and(frame, frame, mask = mask)

   newImg = cv2.cvtColor(masked_img, cv2.COLOR_BGR2GRAY)
   # collectiong contours

   c,h = cv2.findContours(newImg, cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
   cont_sorted = sorted(c, key=cv2.contourArea, reverse=True)[:5]
   x,y,w,h = cv2.boundingRect(cont_sorted[0])

   cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),5)

   #cv2.imshow('frame',masked_img)
   cv2.imshow('Out',frame)

   if cv2.waitKey(1) & 0xFF == ord('q') or ret==False :
      break

   diff = timer() - start

   while  diff < fps:
       diff = timer() - start


cap.release()
cv2.destroyAllWindows()

Link to video: https://www.storyblocks.com/video/stock/boxes-and-packages-move-along-a-conveyor-belt-in-a-shipment-factory-a-few-blank-boxes-for-your-custom-graphics-lmgxtwq

Questions:

How can we be 100% sure if the rectangle drawn is actually on top of a box and not on belt or somewhere else.
Can you please tell me how can I use the function you have provided in original answer to use for other boxes in this new code for video.
Is it correct way to again convert masked frame to grey, find contours again to draw a rectangle. Or is there a more efficient way to do it.
The final version of this code is intended to run on raspberry pi. So what can we do to optimize the code's performance.

Many thank again for your time.

Solution

Your additional questions warranted a separate answer:

1. How can we be 100% sure if the rectangle drawn is actually on top of a box and not on belt or somewhere else?

PRO: For this very purpose I chose the Hue channel of HSV color space. Shades of grey, white and black (on the conveyor belt) are neutral in this channel. The brown color of the box is contrasting could be easily segmented using Otsu threshold. Otsu's algorithm finds the optimal threshold value without user input.
CON You might face problems when boxes are also of the same color as conveyor belt

2. Can you please tell me how can I use the function you have provided in original answer to use for other boxes in this new code for video.

PRO: In case you want to find boxes using edge detection and without using color information; there is a high chance of getting many unwanted edges. By using extract_rect() function, you can filter contours that:
1. have approximately 4 sides (quadrilateral)
2. are above certain area
CON If you have parcels/packages/bags that have more than 4 sides you might need to change this.

3. Is it correct way to again convert masked frame to grey, find contours again to draw a rectangle. Or is there a more efficient way to do it.

I felt this is the best way, because all that is remaining is the textual region enclosed in white. Applying threshold of high value was the simplest idea in my mind. There might be a better way :)

(I am not in the position to answer the 4th question :) )