Search code examples
matlabimage-processingcomputer-visionvideo-tracking

How do I track when multiple objects touch in MATLAB?


I have x,y pixel coordinates of multiple objects that have been tracked from an image (3744x5616). The coordinates are stored in a structure called objects, e.g.

objects(1).centre = [1868 1236]

The objects are each uniquely identified by a numerical code, e.g.

objects(i).code = 33

I want to be able to record each time any two objects come within a radius of 300 pixels each other. What would be the best way to check through if any objects are touching and then record the identity of both objects involved in the interaction, like object 33 interacts with object 34.

Thanks!


Solution

  • Best thing I can think of right now is a brute force approach. Simply check the distances from one object's centre with the rest of the other objects and manually check if the distances are < 300 pixels.

    If you want this fast, we should probably do this without any toolboxes. You can intelligently do this with vanilla MATLAB using bsxfun. First, create separate arrays for the X and Y coordinates of each object:

    points = reshape([objects.centre], 2, []);
    X = points(1,:);
    Y = points(2,:);
    

    [objects.centre] accesses the individual coordinates of each centre field in your structure and unpacks them into a comma-separated list. I reshape this array so that it is 2 rows where the first row is the X coordinate and the second row is the Y coordinate. I extract out the rows and place them into separate arrays.

    Next, create two difference matrices for each X and Y where the rows denote one unique coordinate and the columns denote another unique coordinate. The values inside this matrix are the differences between the point i at row i and point j at column j:

    Xdiff = bsxfun(@minus, X.', X);
    Ydiff = bsxfun(@minus, Y.', Y);
    

    bsxfun stands for Binary Singleton EXpansion FUNction. If you're familiar with the repmat function, it essentially replicates matrices and vectors under the hood so that both inputs you're operating on have the same size. In this case, what I'm doing is specifying X or Y as both of the inputs. One is the transposed version of the other. By doing this bsxfun automatically broadcasts each input so that the inputs match in dimension. Specifically, the first input is a column vector of X and so this gets repeated and stacked horizontally for as many times as there are values in X.

    Similarly this is done for the Y value. After you do this, you perform an element-wise subtraction for both outputs and you get the component wise subtraction between one point and another point for X and Y where the row gives you the first point, and the column gives you the second point. As a toy example, imagine we had X = [1 2 3]. Doing a bsxfun call using the above code gives:

    >> Xdiff = bsxfun(@minus, [1 2 3].', [1 2 3])
    
    Xdiff =
    
    ##  |  1     2     3  
    ----------------------
     1  |  0    -1    -2
     2  |  1     0    -1
     3  |  2     1     0
    

    There are some additional characters I placed in the output, but these are used solely for illustration and to give you a point of reference. By taking a row value from the ## column and subtracting from a column value from the ## row gives you the desired subtract. For example, the first row second column illustrates 1 - 2 = -1. The second row, third column illustrates 2 - 3 = -1. If you do this for both the X and Y points, you get the component-wise distances for one point against all of the other points in a symmetric matrix.


    You'll notice that this is an anti-symmetric matrix where the diagonal is all 0 ... makes sense since the distance of one dimension of one point with respect to itself should be 0. The bottom left triangular portion of the matrix is the opposite sign of the right... because of the order of subtraction. If you subtracted point 1 with point 2, doing the opposite subtraction gives you the opposite sign. However, let's assume that the rows denote the first object and the columns denote the second object, so you'd want to concentrate on the lower half.

    Now, compute the distance, and make sure you set either the upper or lower triangular half to NaN because when computing the distance, the sign gets ignored. If you don't ignore this, we'd find duplicate objects that interact, so object 3 and object 1 would be a different interaction than object 1 and object 3. You obviously don't care about the order, so set either the upper or lower triangular half to NaN for the next step. Assuming Euclidean distance:

    dists = sqrt(Xdiff.^2 + Ydiff.^2);
    dists(tril(ones(numel(objects))==1)) = NaN;
    

    The first line computes the Euclidean distance of all pairs of points and we use tril to extract the lower triangular portion of a matrix that consists of all logical 1. Extracting this matrix, we use this to set the lower half of the matrix to NaN. This allows us to skip entries we're not interested in. Note that I also set the diagonal to 0, because we're not interested in distances of one object to itself.

    Now that you're finally here, search for those objects that are < 300 pixels:

    [I,J] = find(dists < 300);
    

    I and J are row/column pairs that determine which rows and columns in the matrix have values < 300, so in our case, each pair of I and J in the array gives you the object locations that are close to each other.

    To finally figure out the right object codes, you can do:

    codes = [[objects(I).code].' [objects(J).code].'];
    

    This uses I and J to access the corresponding codes of those objects that were similar in a comma-separated list and places them side by side into a N x 2 matrix. As such, each row of codes gives you unique pairs of objects that satisfied the distance requirements.


    For copying and pasting:

    points = reshape([objects.centre], 2, []);
    X = points(1,:);
    Y = points(2,:);
    Xdiff = bsxfun(@minus, X.', X);
    Ydiff = bsxfun(@minus, Y.', Y);
    dists = sqrt(Xdiff.^2 + Ydiff.^2);
    dists(tril(ones(numel(objects))==1)) = NaN;
    [I,J] = find(dists < 300);    
    codes = [[objects(I).code].' [objects(J).code].'];
    

    Toy Example

    Here's an example that we can use to verify if what we have is correct:

    objects(1).centre = [1868 1236];
    objects(2).centre = [2000 1000];
    objects(3).centre = [1900 1300];
    objects(4).centre = [3000 2000];
    objects(1).code = 33;
    objects(2).code = 34;
    objects(3).code = 35;
    objects(4).code = 99;
    

    I initialized 4 objects with different centroids and different codes. Let's see what the dists array gives us after we compute it:

    >> format long g
    >> dists
    
    dists =
    
                           NaN          270.407100498489          71.5541752799933          1365.69396278961
                           NaN                       NaN          316.227766016838           1414.2135623731
                           NaN                       NaN                       NaN          1303.84048104053
                           NaN                       NaN                       NaN                       NaN
    

    I intentionally made the last point farther than any of the other three points to ensure that we can show cases where there are points not near other ones.

    As you can see, points (1,2) and (1,3) are all near each other, which is what we get when we complete the rest of the code. This corresponds to objects 33, 34 and 35 with pairings of (33,34) and (33,35). Points with codes 34 and 35 I made slightly smaller, but they are still greater than the 300 pixel threshold, so they don't count either:

    >> codes
    
    codes =
    
        33    34
        33    35
    

    Now, if you want to display this in a prettified format, perhaps use a for loop:

    for vec = codes.'
        fprintf('Object with code %d interacted with object with code %d\n', vec(1), vec(2));
    end
    

    This for loop is a bit tricky. It's a little known fact that for loops can also accept matrices and the index variable gives you one column of each matrix at a time from left to right. Therefore, I transposed the codes array so that each pair of unique codes becomes a column. I just access the first and second element of each column and print it out.

    We get:

    Object with code 33 interacted with object with code 34
    Object with code 33 interacted with object with code 35