Search code examples
matlabfor-loopparallel-processingcomputational-geometry

Fast footpoint computation over polygonal lines in MATLAB


Problem definition

Consider a 2xm dataset matrix D encoding m 2-dimensional noisy observations. I want to project the dataset D over a 2xN open polygonal line V encoded by N 2-dimensional vertices. The challenge is to do it as fast as possible because both m and N can be very large (say in the order of 1e4).

My naive solution: step 1

Clearly, the main problem amounts to project each single observation contained in D over the line V. Hence, consider for the moment the problem of projecting a single observation over V. Since the line V is nothing but more than a set of N-1 segments, we can further simplify the projection problem to the elementary one of projecting a single observation over a single segment. My solution is the following:

function [dist, pfoot] = footPoint_MATTEO(p, V1, V2)
    pproj = NaN * ones(2, 3);
    dproj = NaN * ones(1, 3);   
    
    alpha = ((p - V1)' * (V2 - V1))/ (norm(V2 - V1)^2);
    
    if alpha < 1 && alpha > 0
        pproj(:, 1) = (1 - alpha) * V1 + alpha * V2;
        dproj(1)    = norm(p - pproj(:, 1));
    end
    pproj(:, 2) = V1;
    dproj(2)    = norm(p - pproj(:, 2));
    pproj(:, 3) = V2;
    dproj(3)    = norm(p - pproj(:, 3));
    [~, istar] = min(dproj, [], 'omitnan');
    dist       = dproj(istar);
    pfoot      = pproj(:, istar);
end

here p is the single observation, V1 and V2 are the two vertices of the segment, dist is the distance between p and the segment V1,V2, pfoot is the projection of p over the line V1,V2. The rational is the following: pfoot is one of the two vertices or a point between them. If it is a point between them, then it can be written as convex combination with weight alpha in the interval (0,1). The value of alpha is a simple consequence of standard geometry. Finally, to choose pfoot between the three possible cases we minimize the distance.

My naive solution: step 2

Now, to get the general solution, we have just to repeat the previous procedure for each lines in the curve V and each observation in the dataset D. My solution is thus the following:

function [Ddist, Dfoot] = shapeFootPoint_MATTEO(D, V)
    N     = size(V, 2);
    m     = size(D, 2);
    Dfoot = NaN * ones(2, m);
    Ddist = NaN * ones(1, m);
    for j = 1 : m
        dist  = NaN * ones(1, N);
        pfoot = NaN * ones(2, N);
        for i = 1 : N - 1
            [dist(i), pfoot(:, i)] = footPoint_MATTEO(D(:, j), V(:, i), V(:, i + 1));
        end
        [~, istar]  = min(dist);
        Dfoot(:, j) = pfoot(:, istar);
        Ddist(j)    = norm(Dfoot(:, j) - D(:, j));
    end
end

here Ddist, Dfoot are respectively the relative list of distances and the projected dataset. Note that each single observation is projected over each segments and after that the nearest projection is selected as the definitive projection.

An attempt to cut down the execution time

Clearly, in shapeFootPoint_MATTEO both for cycles can be parallelized. For example, parallelizing the segment cycle gives the following solution:

function [Ddist, Dfoot] = fastShapeFootPoint_MATTEO(D, V)
    N     = size(V, 2);
    m     = size(D, 2);
    Dfoot = NaN * ones(2, m);
    Ddist = NaN * ones(1, m);
    
    Vnext = V(:, 2 :end);
    for j = 1 : m
        dist  = NaN * ones(1, N);
        pfoot = NaN * ones(2, N);
        Dj    = D(:, j);
        parfor i = 1 : N - 1
            [dist(i), pfoot(:, i)] = footPoint_MATTEO(Dj, V(:, i), Vnext(:, i));
        end
        [~, istar]  = min(dist);
        Dfoot(:, j) = pfoot(:, istar);
        Ddist(j)    = norm(Dfoot(:, j) - D(:, j));
    end
end

Test drive

In order to test the performance of the previous solutions, consider the following dummy script:

lc
clear
close all
m = 100;
N = 30;
D = rand(2, m);
V = rand(2, N);
tic
shapeFootPoint_MATTEO(D, V);
toc
tic
fastShapeFootPoint_MATTEO(D, V);
toc

My laptop gives the following results:

Elapsed time is 0.021693 seconds.
Elapsed time is 2.908765 seconds.

Hence, the parallelized version is much more slower than the original one.

Questions

  1. Is my naive solution already optimized? If not, how do we reduce the computational cost / improve the code implementation?
  2. Given the same dataset D, let's say that we want to repeat the process over multiple polygonal lines V. Naively, this means that we need to consider an additional for loop over the function ShapeFootPoint_MATTEO. In that case, how do we speed up the computation? Can we use the same principles to answer question 1?

EDIT

Using the following script,

clc
clear
close all
m = 100;
N = 30;
D = rand(2, m);
V = rand(2, N);


f = @() shapeFootPoint_MATTEO(D, V);
t = timeit(f)

f = @() fastShapeFootPoint_MATTEO(D, V);
t = timeit(f)

I get the following result

t =

    0.0083


t =

    1.6220

EDIT 2

Some improvements can be observed by simplifying the basic function that computes the projection of a single observation over a single segment.

function [dist, pfoot] = fastFootPoint_MATTEO(p, V1, V2)
    alpha = ((p - V1)' * (V2 - V1))/ (norm(V2 - V1)^2);
    alpha = min(1, max(0, alpha));
    pfoot = (1 - alpha) * V1 + alpha * V2;
    dist  = norm(p - pfoot);
end

EDIT 3

Actually, when projecting a point over the polygonal line there is no need to compute the projection over all N - 1 segments. The next function provides some improvements.

function [Ddist, Dfoot] = fastShapeFootPoint_MATTEO(D, V)
    m     = size(D, 2);
    Dfoot = NaN (2, m);
    Ddist = NaN (1, m);

    for j = 1 : m
        Dj         = D(:, j);
        distanceDj = vecnorm(V - Dj, 2, 1);
        [~, istar] = min(distanceDj);
        if istar == 1
            [Ddist(j), Dfoot(:, j)] = fastFootPoint_MATTEO(Dj, V(:, istar), V(:, istar + 1));
        elseif istar == size(V, 2)
            [Ddist(j), Dfoot(:, j)] = fastFootPoint_MATTEO(Dj, V(:, istar - 1), V(:, istar));
        else
            [dist1, Dfoot1] = fastFootPoint_MATTEO(Dj, V(:, istar), V(:, istar + 1));
            [dist2, Dfoot2] = fastFootPoint_MATTEO(Dj, V(:, istar - 1), V(:, istar));
            if dist1 <= dist2
                Ddist(j)    = dist1;
                Dfoot(:, j) = Dfoot1;
            else
                Ddist(j)    = dist2;
                Dfoot(:, j) = Dfoot2;
            end
        end
    end
end

The rational is the following: given an observation, search for the nearest vertex V(:, istar). Then, compute the projections over the two segments connected at V(:, istar) and select the one nearest to the observation. Note that some care is needed to handle the boundary vertex V(:, 1), V(:, end).


Solution

  • % baseline: distance (and index) from each query point to the closest vertex
    [vertexDistMin, vertexIds] = pdist2(V', D', 'euclidean', 'Smallest', 1);
    
    % calculate the projections from each query point onto ALL the vectors (infinite intervals)
    V = permute(V, [2 3 1]);
    D = permute(D, [3 2 1]);
    
    v = diff(V);
    vnorm = vecnorm(v,2,3);
    vhat = v./vnorm;
    mag = sum(vhat.*(D-V(1:end-1,:,:)), 3);
    proj = V(1:end-1,:,:)+mag.*vhat;
    
    % find the closest interval for each projection
    projDists = vecnorm(D-proj,2,3);
    projDists(mag<0 | mag>vnorm) = Inf; % filter out the projections that fall off the proscribed intervals
    [projDistMin,projIds] = min(projDists,[],1);
    
    % set the output to the closest projection
    ind = sub2ind(size(proj), projIds, 1:size(proj,2));
    out = squeeze(proj(cat(3,ind,ind+numel(proj(:,:,1)))));
    
    % if the closest vertex is closer than the closest projection, override output
    out(vertexDistMin < projDistMin, :) = squeeze(V(vertexIds(vertexDistMin < projDistMin), :, :));
    
    out = out'; % reshape as desired