matlab image-processing computer-vision feature-extraction matlab-cvst

Not getting what 'spatial weights' for HOG are

I am using HOG for sunflower detection. I understand most of what HOG is doing now, but have some things that I do not understand in the final stages. (I am going through the MATLAB code from Mathworks).

Let us assume we are using the Dalal-Triggs implementation. (That is, 8x8 pixels make 1 cell, 2x2 cells make 1 block, blocks are taken at 50% overlap in both directions, and lastly, that we have quantized the histograms into 9 bins, unsigned. (meaning, from 0 to 180 degrees)). Finally, our image here is 64x128 pixels.

Let us say that we are on the first block. This block has 4 cells. I understand that we are going to weight the orientations of each of the orientations by their magnitude. I also understand that we are going to weight them further, by a gaussian centered on the block.

So far so good.

However in the MATLAB implementation, they have an additional step, whereby they create a 'spatial' weight:

enter image description here

If we dive into this function, it looks like this:

enter image description here

Finally, the function 'computeLowerHistBin' looks like this:

function [x1, b1] = computeLowerHistBin(x, binWidth)
% Bin index
width    = single(binWidth);
invWidth = 1./width;
bin      = floor(x.*invWidth - 0.5);

% Bin center x1
x1 = width * (bin + 0.5);

% add 2 to get to 1-based indexing
b1 = int32(bin + 2);
end

Now, I believe that those 'spatial' weights are being used during the tri-linear interpolation part later on... but what I do not get is just how exactly they are being computed, or the logic behind that code. I am completely lost on this issue.

Note: I understand the need for the tri-linear interpolation, and (I think) how it works. What I do not understand is why we need those 'spatial weights', and what the logic behind their computation here is.

Thanks.

Solution

This code is pre-computing the spatial weights for the trilinear interpolation. Take a look at the equation here for trilinear interpolation:

HOG Trilinear Interpolation of Histogram Bins

There you see things like (x-x1)/bx, (y-y1)/by, (1 - (x-x1)/bx), etc. In the code, wx1 and wy1 correspond to:

wx1 = (1 - (x-x1)/bx)
wy1 = (1 - (y-y1)/by)

Here, x1 and y1 are centers of the histogram bins for the X and Y directions. It's easier to describe these things in 1D. So in 1D, a value x will fall between 2 bin centers x1 <= x < x2. It doesn't matter exactly bin (1 or 2) it belongs. The important thing is to figure out the fraction of x that belongs to x1, the rest belongs to x2. Using the distance from x to x1 and dividing by the width of the bin gives a percentage distance. 1 minus that is the fraction that belongs to bin 1. So if x == x1, wx1 is 1. And if x == x2, wx1 is zero because x2 - x1 == bx (the width of a bin).

Going back to the code that creates the 4 matrices is just pre-computing all the multiplications of the weights needed for the interpolation of all the pixels in a HOG block. That is why it is a matrix of weights: each element in the matrix if for one of the pixels in the HOG block.

For example, you look at the equation for the wieghts for h(x1, y2, ~) you'll see these 2 weights for x and y (ignoring the z component).

(1 - (x-x1)/bx) * ((y-y1)/by)

Going back to the code, this multiplication is pre-computed for every pixel in the block using:

weights.x1y2 = (1-wy1)' * wx1;

where

(1-wy1) == (y - y1)/by

The same logic applies to the other weight matrices.

As for the code in "computeLowerHistBin", it's just finding the x1 in the trilinear interpolation equation, where x1 <= x < x2 (same for y1). There are probably a bunch of ways to solve this problem given a pixel location x and the width of a bin bx as long as you satisfy x1 <= x < x2.

For example, "|" indicate bin edges. "o" are the bin centers.

-20             0              20               40
 |------o-------|-------o-------|-------o-------|
       -10              10              30

if x = [2 9 11], the lower bin center x1 is [-10 -10 10].