Problem definition
Consider a 2xm
dataset matrix D
encoding m
2-dimensional noisy observations. I want to project the dataset D
over a 2xN
open polygonal line V
encoded by N
2-dimensional vertices. The challenge is to do it as fast as possible because both m
and N
can be very large (say in the order of 1e4
).
My naive solution: step 1
Clearly, the main problem amounts to project each single observation contained in D
over the line V
. Hence, consider for the moment the problem of projecting a single observation over V
. Since the line V
is nothing but more than a set of N-1
segments, we can further simplify the projection problem to the elementary one of projecting a single observation over a single segment. My solution is the following:
function [dist, pfoot] = footPoint_MATTEO(p, V1, V2)
pproj = NaN * ones(2, 3);
dproj = NaN * ones(1, 3);
alpha = ((p - V1)' * (V2 - V1))/ (norm(V2 - V1)^2);
if alpha < 1 && alpha > 0
pproj(:, 1) = (1 - alpha) * V1 + alpha * V2;
dproj(1) = norm(p - pproj(:, 1));
end
pproj(:, 2) = V1;
dproj(2) = norm(p - pproj(:, 2));
pproj(:, 3) = V2;
dproj(3) = norm(p - pproj(:, 3));
[~, istar] = min(dproj, [], 'omitnan');
dist = dproj(istar);
pfoot = pproj(:, istar);
end
here p
is the single observation, V1
and V2
are the two vertices of the segment, dist
is the distance between p
and the segment V1,V2
, pfoot
is the projection of p
over the line V1,V2
. The rational is the following: pfoot
is one of the two vertices or a point between them. If it is a point between them, then it can be written as convex combination with weight alpha
in the interval (0,1). The value of alpha
is a simple consequence of standard geometry. Finally, to choose pfoot
between the three possible cases we minimize the distance.
My naive solution: step 2
Now, to get the general solution, we have just to repeat the previous procedure for each lines in the curve V
and each observation in the dataset D
. My solution is thus the following:
function [Ddist, Dfoot] = shapeFootPoint_MATTEO(D, V)
N = size(V, 2);
m = size(D, 2);
Dfoot = NaN * ones(2, m);
Ddist = NaN * ones(1, m);
for j = 1 : m
dist = NaN * ones(1, N);
pfoot = NaN * ones(2, N);
for i = 1 : N - 1
[dist(i), pfoot(:, i)] = footPoint_MATTEO(D(:, j), V(:, i), V(:, i + 1));
end
[~, istar] = min(dist);
Dfoot(:, j) = pfoot(:, istar);
Ddist(j) = norm(Dfoot(:, j) - D(:, j));
end
end
here Ddist
, Dfoot
are respectively the relative list of distances and the projected dataset. Note that each single observation is projected over each segments and after that the nearest projection is selected as the definitive projection.
An attempt to cut down the execution time
Clearly, in shapeFootPoint_MATTEO
both for cycles can be parallelized. For example, parallelizing the segment cycle gives the following solution:
function [Ddist, Dfoot] = fastShapeFootPoint_MATTEO(D, V)
N = size(V, 2);
m = size(D, 2);
Dfoot = NaN * ones(2, m);
Ddist = NaN * ones(1, m);
Vnext = V(:, 2 :end);
for j = 1 : m
dist = NaN * ones(1, N);
pfoot = NaN * ones(2, N);
Dj = D(:, j);
parfor i = 1 : N - 1
[dist(i), pfoot(:, i)] = footPoint_MATTEO(Dj, V(:, i), Vnext(:, i));
end
[~, istar] = min(dist);
Dfoot(:, j) = pfoot(:, istar);
Ddist(j) = norm(Dfoot(:, j) - D(:, j));
end
end
Test drive
In order to test the performance of the previous solutions, consider the following dummy script:
lc
clear
close all
m = 100;
N = 30;
D = rand(2, m);
V = rand(2, N);
tic
shapeFootPoint_MATTEO(D, V);
toc
tic
fastShapeFootPoint_MATTEO(D, V);
toc
My laptop gives the following results:
Elapsed time is 0.021693 seconds.
Elapsed time is 2.908765 seconds.
Hence, the parallelized version is much more slower than the original one.
Questions
D
, let's say that we want to repeat the process over multiple polygonal lines V
. Naively, this means that we need to consider an additional for loop over the function ShapeFootPoint_MATTEO
. In that case, how do we speed up the computation? Can we use the same principles to answer question 1?EDIT
Using the following script,
clc
clear
close all
m = 100;
N = 30;
D = rand(2, m);
V = rand(2, N);
f = @() shapeFootPoint_MATTEO(D, V);
t = timeit(f)
f = @() fastShapeFootPoint_MATTEO(D, V);
t = timeit(f)
I get the following result
t =
0.0083
t =
1.6220
EDIT 2
Some improvements can be observed by simplifying the basic function that computes the projection of a single observation over a single segment.
function [dist, pfoot] = fastFootPoint_MATTEO(p, V1, V2)
alpha = ((p - V1)' * (V2 - V1))/ (norm(V2 - V1)^2);
alpha = min(1, max(0, alpha));
pfoot = (1 - alpha) * V1 + alpha * V2;
dist = norm(p - pfoot);
end
EDIT 3
Actually, when projecting a point over the polygonal line there is no need to compute the projection over all N - 1
segments. The next function provides some improvements.
function [Ddist, Dfoot] = fastShapeFootPoint_MATTEO(D, V)
m = size(D, 2);
Dfoot = NaN (2, m);
Ddist = NaN (1, m);
for j = 1 : m
Dj = D(:, j);
distanceDj = vecnorm(V - Dj, 2, 1);
[~, istar] = min(distanceDj);
if istar == 1
[Ddist(j), Dfoot(:, j)] = fastFootPoint_MATTEO(Dj, V(:, istar), V(:, istar + 1));
elseif istar == size(V, 2)
[Ddist(j), Dfoot(:, j)] = fastFootPoint_MATTEO(Dj, V(:, istar - 1), V(:, istar));
else
[dist1, Dfoot1] = fastFootPoint_MATTEO(Dj, V(:, istar), V(:, istar + 1));
[dist2, Dfoot2] = fastFootPoint_MATTEO(Dj, V(:, istar - 1), V(:, istar));
if dist1 <= dist2
Ddist(j) = dist1;
Dfoot(:, j) = Dfoot1;
else
Ddist(j) = dist2;
Dfoot(:, j) = Dfoot2;
end
end
end
end
The rational is the following: given an observation, search for the nearest vertex V(:, istar)
. Then, compute the projections over the two segments connected at V(:, istar)
and select the one nearest to the observation. Note that some care is needed to handle the boundary vertex V(:, 1)
, V(:, end)
.
% baseline: distance (and index) from each query point to the closest vertex
[vertexDistMin, vertexIds] = pdist2(V', D', 'euclidean', 'Smallest', 1);
% calculate the projections from each query point onto ALL the vectors (infinite intervals)
V = permute(V, [2 3 1]);
D = permute(D, [3 2 1]);
v = diff(V);
vnorm = vecnorm(v,2,3);
vhat = v./vnorm;
mag = sum(vhat.*(D-V(1:end-1,:,:)), 3);
proj = V(1:end-1,:,:)+mag.*vhat;
% find the closest interval for each projection
projDists = vecnorm(D-proj,2,3);
projDists(mag<0 | mag>vnorm) = Inf; % filter out the projections that fall off the proscribed intervals
[projDistMin,projIds] = min(projDists,[],1);
% set the output to the closest projection
ind = sub2ind(size(proj), projIds, 1:size(proj,2));
out = squeeze(proj(cat(3,ind,ind+numel(proj(:,:,1)))));
% if the closest vertex is closer than the closest projection, override output
out(vertexDistMin < projDistMin, :) = squeeze(V(vertexIds(vertexDistMin < projDistMin), :, :));
out = out'; % reshape as desired