I have a small tracking project that I am working on. I have my frame by frame detection scheme setup and working. When I run I get a fair amount of noise in the polygon that I extract even if the scene is static. Since I want this run in real time, it seems Kalman filtering is the best way to solve this problem; however implementation details are sparse. I have seen some examples via google, but they typically deal with bounding boxes or regular shapes, which are described with only a few bits of info. I am not sure that approach would work.
I am interested in tracking the evolution of a more irregular geometry below. It takes ~100 points or more to describe the polygon. How can I adapt the OpenCV kalman tools to handle this task?
Thanks in advance.
** Update **
So additional details. I need to have an accurate profile of the object for downstream analysis so a bounding box is not an option. My camera can produce frames at 30 fps, but I do not need to process that fast, though I do not want to only process 1 a second either. Doing a fast de-noising operation is too slow. My images are 4024x3036 monochrome images. I attached jpeg versions of six shots of my scene. The sample is the small chunk in the center of the two plates in the bottom third of the image. I also attached what I am looking to pull from each frame an irregular polygon that matches the 2d profile of the shape accurately. I will favor accuracy and stability over speed but I would like to process a few frames per second.
I will go capture some representative images or small movie and will post shortly.
Thanks in advance.
Sample Images
The goal
Notice how among the columns of the images, the columns where the purple lines should go have the most black? We can detect the ROI (region of interest) by first detecting the first and last columns with at least certain amount of black. Then detect the rows between the 2 detected columns where the white color first starts and first ends at the 2 columns.
import cv2
import numpy as np
files = [f"img{i}.jpg" for i in range(1, 6)]
for file in files:
img = cv2.imread(file)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
sum_cols = thresh.sum(0)
indices = np.where(sum_cols < sum_cols.min() + 40000)[0]
x1, x2 = indices[0] - 50, indices[-1] + 50
diff1, diff2 = np.diff(thresh[:, [x1, x2]].T, 1)
y1_1, y2_1 = np.where(diff1)[0][:2]
y1_2, y2_2 = np.where(diff2)[0][:2]
y1, y2 = min(y1_1, y1_2), max(y2_1, y2_2)
img_canny = cv2.Canny(thresh[y1: y2, x1: x2], 50, 50)
contours, _ = cv2.findContours(img_canny, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cv2.line(img, (x1, y1_1), (x2, y1_2), (255, 0, 160), 5)
cv2.line(img, (x1, y2_1), (x2, y2_2), (255, 0, 160), 5)
cv2.drawContours(img[y1: y2, x1: x2], contours, -1, (0, 0, 255), 10)
cv2.imshow("Image", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Here are what the program will output for each different image you provided:
import cv2
import numpy as np
cv2.VideoCapture()
method. As I only have the images you provided, I'll make the program read in each image. So, store every image filename into a list (I have img1.jpg
, img1.jpg
, ... img5.jpg
), iterate through the names and read in each image:files = [f"img{i}.jpg" for i in range(1, 6)]
for file in files:
img = cv2.imread(file)
cv2.threshold()
method to convert the grayscale images to have only 2 values; 0
for each pixel that's less or equal to 127
, and 255
for each pixel that's more than 127
: gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
0
s (which means with most black), we'll need to find the sum of every column, where the smallest sum will be from the column with the most 0
s. With the sum of each column, we can use the np.where()
method to find the index of every column in the thresholded image that sums up to a number close to the smallest sum detected. Then, we can get the first index and the last index of the detected columns to be the x1
and x2
of our ROI (along with a padding of 50
pixels): sum_cols = thresh.sum(0)
indices = np.where(sum_cols < sum_cols.min() + 40000)[0]
x1, x2 = indices[0] - 50, indices[-1] + 50
y1
and y2
of the top lines, we'll need to detect the index of the first occurrence of a change from 0
to 255
in the first edge of the detected columns and in the last edge of the detected columns. Similarly, in order to find the y1
and y2
of the bottom line, we'll need to detect the index of the first occurrence of a change from 255
to 0
in the first edge of the detected columns and in the last edge of the detected columns. Finally, with our 4 y
coordinates, we can get the y1
and y2
of our ROI by getting the smallest of the y
coordinates in the first line, and the greatest of the y
coordinates in the second line: diff1, diff2 = np.diff(thresh[:, [x1, x2]].T, 1)
y1_1, y2_1 = np.where(diff1)[0][:2]
y1_2, y2_2 = np.where(diff2)[0][:2]
y1, y2 = min(y1_1, y1_2), max(y2_1, y2_2)
cv2.findContours()
method: img_canny = cv2.Canny(thresh[y1: y2, x1: x2], 50, 50)
contours, _ = cv2.findContours(img_canny, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cv2.line(img, (x1, y1_1), (x2, y1_2), (255, 0, 160), 5)
cv2.line(img, (x1, y2_1), (x2, y2_2), (255, 0, 160), 5)
cv2.drawContours(img[y1: y2, x1: x2], contours, -1, (0, 0, 255), 10)
cv2.imshow("Image", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break