c++opencv object-detection template-matching

Hausdorff Distance Object Detection

I have been struggling trying to implement the outlining algorithm described here and here.

The general idea of the paper is determining the Hausdorff distance of binary images and using it to find the template image from a test image.

For template matching, it is recommended to construct image pyramids along with sliding windows which you'll use to slide over your test image for detection. I was able to do both of these as well.

I am stuck on how to move forward from here on. Do I slide my template over the test image from different pyramid layers? Or is it the test image over the template? And with regards to the sliding window, is/are they meant to be a ROI of the test or template image?

In a nutshell, I have pieces to the puzzle but no idea of which direction to take to solve the puzzle

int distance(vector<Point>const& image, vector<Point>const& tempImage)
{
    int maxDistance = 0;

    for(Point imagePoint: image)
    {
        int minDistance = numeric_limits<int>::max();

        for(Point tempPoint: tempImage)
        {
            Point diff = imagePoint - tempPoint;
            int length = (diff.x * diff.x) + (diff.y * diff.y);

            if(length < minDistance) minDistance = length;
            if(length == 0) break;
        }
        maxDistance += minDistance;
    }
    return maxDistance;
}

double hausdorffDistance(vector<Point>const& image, vector<Point>const& tempImage)
{
    double maxDistImage = distance(image, tempImage);
    double maxDistTemp = distance(tempImage, image);

    return sqrt(max(maxDistImage, maxDistTemp));
}

vector<Mat> buildPyramids(Mat& frame)
{
    vector<Mat> pyramids;

    int count = 6;

    Mat prevFrame = frame, nextFrame;

    while(count > 0)
    {
        resize(prevFrame, nextFrame, Size(), .85, .85);
        prevFrame = nextFrame;

        pyramids.push_back(nextFrame);

        --count;
    }

    return pyramids;
}

vector<Rect> slidingWindows(Mat& image, int stepSize, int width, int height)
{
    vector<Rect> windows;

    for(size_t row = 0; row < image.rows; row += stepSize)
    {
        if((row + height) > image.rows) break;

        for(size_t col = 0; col < image.cols; col += stepSize)
        {
            if((col + width) > image.cols) break;

            windows.push_back(Rect(col, row, width, height));
        }
    }

    return windows;
}

Solution

Edit I: More analysis on my solution can be found here

This is a bi-directional task.

Forward Direction

1. Translation

For each contour, calculate its moment. Then for each point in that contour, translate it about the moment i.e. contour.point[i] = contour.point[i] - contour.moment[i]. This moves all of the contour points to the origin.

PS: You need to keep track of each contour's produced moment because it will be used in the next section

2. Rotation

With the newly translated points, calculate their rotated rect. This will give you the angle of rotation. Depending on this angle, you would want to calculate the new angle which you want to rotate this contour by; this answer would be helpful.

After attaining the new angle, calculate the rotation matrix. Remember that your center here will be the origin i.e. (0, 0). I did not take scaling into account (that's where the pyramids come into play) when calculating the rotation matrix hence I passed 1.

PS: You need to keep track of each contour's produced matrix because it will be used in the next section

Using this matrix, you can go ahead and rotate each point in the contour by it as shown here^*.

Once all of this is done, you can go ahead and calculate the Hausdorff distance and find contours which pass your set threshold.

Back Direction

Everything done in the first section, has to be undone in order for us to draw the valid contours onto our camera feed.

1. Rotation

Recall that each detected contour produced a rotation matrix. You want to undo the rotation of the valid contours. Just perform the same rotation but using the inverse matrix.

For each valid contour and corresponding matrix
inverse_matrix = matrix[i].inv(cv2.DECOMP_SVD)
Use * to rotate the points but with inverse_matrix as parameter

PS: When calculating the inverse, if the produced matrix was not a square one, it would fail. cv2.DECOMP_SVD will produce an inverse matrix even if the original matrix was a non-square.

2. Translation

With the valid contours' points rotated back, you just have to undo the previously performed translation. Instead of subtracting, just add the moment to each point.

You can now go ahead and draw these contours to your camera feed.

Scaling

This is were image pyramids come into play.

All you have to do is resize your template image by a fixed size/ratio upto your desired number of times (called layers). The tutorial found here does a good job of explaining how to do this in OpenCV.

It goes without saying that the values you choose to resize your image by and number of layers will and do play a huge role in how robust your program will be.

Put it all together

Template Image Operations

Create a pyramid consisting of n layers
For each layer in n
    Find contours
    Translate the contour points
    Rotate the contour points

This operation should only be performed once and only store the results of the rotated points.

Camera Feed Operations

Assumptions

Let the rotated contours of the template image at each level be stored in templ_contours. So if I say templ_contours[0], this is going to give me the rotated contours at pyramid level 0.

Let the image's translated, rotated contours and moments be stored in transCont, rotCont and moment respectively.

image_contours = Find Contours
for each contour detected in image
    moment = calculate moment

for each point in image_contours
    transCont.thisPoint = forward_translate(image_contours.thisPoint)
    rotCont.thisPoint = forward_rotate(transCont.thisPoint)

for each contour_layer in templ_contours
    for each contour in rotCont
        calculate Hausdorff Distance
        valid_contours = contours_passing_distance_threshold

for each point in valid_contours
    valid_point = backward_rotate(valid_point)

for each point in valid_contours
    valid_point = backward_translate(valid_point)

drawContours(valid_contours, image)