Search code examples
matlab2dpolygoncurve-fitting

Fitting a polygon in MATLAB


Problem definition

I'm attempting to reconstruct a non-convex and non-self-intersecting polygonal line (which I call the target line) from a noisy dataset (see the figure below). Therefore, my objective is to learn the positions of the vertices of the target line from the dataset. I refer to the estimated polygonal line as the polygonal line.

The target line is assumed to be normalized, i.e., contained in the intervals [-1,1] (horizontal direction) and [0,1] (vertical direction). Moreover, the target line starts from the point [1 0]' and ends at a point crossing the horizontal axis of the reference system.

The main challenges of this problem are the following:

  • The dataset is unordered, meaning that the only information available is the position of the noisy observations. The order refers to the time sequence in which we would observe the observations if we traveled along the target line from its starting point to its ending point. It's important to note that if the order is given, the problem can be easily solved with standard curve-fitting solvers (e.g., by least squares minimization). However, this is not the case for me...

  • The algorithm must be fast even for large datasets (i.e., the computational time must be on the order of seconds even when the dataset contains 1e4 or even more observations) because I need to implement it in a much larger system that runs in real-time. Thus, I cannot consider fancy algorithms based on optimization problems.

  • The algorithm must be robust when the dataset is small (e.g., if the number of observations is on the order of 10 or fewer) or when the standard deviation of the Gaussian noise is large (e.g., larger than 1 in the current normalized units). By robust, I mean that in these challenging scenarios, the estimated shape should be something simple and meaningful, such as a half-ellipse, rather than a randomly intertwined line. In other words, the algorithm must err on the side of caution: if there is limited information, the response should be simple and vague.

  • The number of vertices of the estimated polygonal line has to be equal to a prefixed value N

I'm able to obtain a rough estimate with an algorithm that is quite fast (even though it's not optimized), so I'm seeking suggestions to enhance it regarding some critical issues. I use the following conventions:

  • hatV is a 2xN matrix containing the coordinates of the N vertices forming the polygonal line;
  • D is a 2xm matrix containing the coordinates of the m observations forming the dataset. Such points are generated by uniformly picking a point on the target line and then by adding Gaussian noise.

My algorithm takes as input D, N, and returns hatV.


My algorithm (overview)

function hatV = staticShaper(D, N)
    % step 0: unfrozen initialization 
    hatV   = 1.5 * [cos(linspace(0, pi, N)); sin(linspace(0, pi, N))];
    freeze = zeros(1, N);

    for iter = 1 : 3
        % step 1: project the polygonal line over the dataset
        idx = projectLine(hatV, D);
       
        % step 2: update the polygonal line
        hatV = updateLine(hatV, D, idx, freeze);

        % step 3: simplify the polygonal line
        [hatV, Ndeficit] = simplifyLine(hatV);

        % step 4: interpolate and freeze the polygonal line
        [hatV, freeze] = interpolateLine(hatV, Ndeficit);
    end
end
  • step 0: By assumption, the target line that I want to estimate (and the corresponding dataset D) is fully contained within the intervals [-1, 1] (horizontal direction) and [0, 1] (vertical direction). Accordingly, I initialize the polygonal line hatV by placing the N vertices along a semicircle of radius 1.5, ensuring that hatV encompasses the target line. Additionally, I utilize a vector freeze that contains a flag for each vertex. If freeze(i) == 1, then the i-th vertex is frozen, meaning its position cannot be changed. If freeze(i) == 0, then the i-th vertex is not frozen, allowing its position to be changed.
  • For the sake of simplicity, I consider a cycle with a prefixed number of iteration. Beside this simplification, I think I can prove that my algorithm always converges in a finite number of iterations because in the end all vertices are frozen.
  • step 1: For each individual vertex in the current estimated polygonal line, I search within the dataset for the nearest observation. Here, idx(i) indicates that the nearest observation to vertex i is the idx(i)-th observation within the dataset D.
  • step 2: Update the polygonal line by placing each unfrozen vertex over its nearest observation.
  • step 3: The previous operation may result in multiple vertices being placed over the same observation, leading to redundancy. In this step, the polygonal line is simplified by merging vertices that are close to each other, eliminating redundancy. During this process, the number of vertices may decrease from N to N - Ndeficit.
  • step 4: To ensure that the final result contains N vertices, I insert Ndeficit new vertices into the simplified polygonal line. The newly inserted vertices represent a degree of freedom, while the old vertices are considered fixed. Therefore, I freeze the old vertices and designate the newly inserted vertices as unfrozen, allowing their positions to be updated in the next cycle of the algorithm.

Questions

enter image description here This picture depicts the four different steps of my algorithm over three iterations, considering N = 50. The dots represent the unfrozen vertices, while the circled dots indicate the frozen vertices. The black lines connect the unfrozen vertices with their nearest observations in the dataset.

enter image description here

My main concern is about how to improve the accuracy of the result. There are two main issues with my solution.

  • issue 1: my algorithm cannot go inside "hard" concave regions.
  • issue 2: my algorithm cannot fit the dataset, but rather it outlines the "outer" border of the dataset.

So my first question is: how do I solve these two issues? In addition to this question, I have the following one: how do I prove (or disprove) that my algorithm generates a polygonal line that does not intersect with itself?

My algorithm (more details)

function idx =  projectLine(hatV, D)
    idx = NaN(1, size(hatV, 2));
    for i = 1 : size(hatV, 2)
        [~, idx(i)] = min(vecnorm(hatV(:, i) - D));
    end
end

For each vertex hatV(:, i) I search for the nearest observation in D.


function hatV = updateLine(hatV, D, idx, freeze)
    for i = 1 : size(hatV, 2)
        if not(freeze(i))
            hatV(:, i) = D(:, idx(i));
        end
    end

    hatV(:, 1)   = [1 0]';
    hatV(2, end) = 0;
end

I place each unfrozen vertex hatV(:, i) over its nearest observation D(:, idx(i)). After that, I ensure that the first vertex hatV(:, 1) is at [1 0]', and I confirm that the final vertex hatV(:, end) has no vertical component. This is because I know that the starting point of the target line is [1 0]' and that the target line ends on the horizontal axis of the reference system.


function [hatV, Ndeficit] = simplifyLine(hatV)
    oldHatV    = hatV; % backup 
    k          = 1;
    hatV       = [];
    hatV(:, 1) = oldHatV(:, 1);
    for i = 1 : size(oldHatV, 2)
        if norm(oldHatV(:, i) - hatV(:, k)) < 0.01
            hatV(:, k) = 0.5 * (hatV(:, k) + oldHatV(:, i));
        else
            k          = k + 1;
            hatV(:, k) = oldHatV(:, i);
        end
    end
    Ndeficit = size(oldHatV, 2) - size(hatV, 2);
end

I have to say that this apparently simple task takes me some time, and I'm not even sure if this is a good implementation. However, here I aim to eliminate identical or similar vertices. To achieve this, I examine the distances between vertices. If the distance is small enough, I perform a merging operation by taking the mean value. In the end, I count how many vertices are lost during these operations.


function [hatV, freeze] = interpolateLine(hatV, Ndeficit)
    % step 1: normalized edge lengths computation
    ell = NaN(1, size(hatV, 2) - 1);
    for i = 1 : size(hatV, 2) - 1
        ell(i) = norm(hatV(:, i + 1) - hatV(:, i));
    end
    ell = ell / sum(ell);
    
    % step 2: new vertices generation
    newVertContainer = cell(1, size(hatV, 2) - 1);
    for i = 1 : size(hatV, 2) - 1
        nrNewVertices       = round(Ndeficit * ell(i)); 
        alpha               = 0;
        newVertContainer{i} = [];
        for j = 1 : nrNewVertices
            newVertex           = (1 - alpha) * hatV(:, i) + alpha * hatV(:, i + 1);
            newVertContainer{i} = cat(2, newVertContainer{i}, newVertex);
            alpha               = alpha + 1 / nrNewVertices;
        end
    end
    
    % step 3: output definition
    oldHatV = hatV;
    hatV    = oldHatV(:, 1);
    freeze  = 1;
    for i = 1 : size(oldHatV, 2) - 1
        hatV   = cat(2,   hatV,   newVertContainer{i}, oldHatV(:, i + 1) );
        freeze = cat(2, freeze, zeros(1, size(newVertContainer{i}, 2)), 1);
    end
end

The idea is to insert new vertices 'uniformly' along the current polygonal line. To achieve this, I add a number of new vertices to each edge of the polygonal line that is proportional to the edge length.

The first step is to compute the length of each edge of the estimated polygonal line and normalize the results. If edge i has a normalized length ell(i), then the number of new vertices generated between the two old vertices defining the edge is round(Ndeficit * ell(i)).

Clearly, this strategy does not guarantee that the polygonal line will have exactly N vertices in the end, but this issue is not a big problem at the moment (I will fix it later).

In step 2, I generate the new vertices, which are temporarily stored in newVertContainer. Finally, in step 3, I define the updated polygonal line where the old vertices are frozen, and the new vertices are unfrozen.


Solution

  • k = boundary(D(1,:).',D(2,:).',1);
    bound = [D(1,k).',D(2,k).'];
    

    (The third input argument in this call of MATLAB's boundary function is an optional argument that is used to "tighten" the boundary around the dataset if it is close to 1)

    The code runs in around 20ms on my machine and is pretty good at giving you both the upper and lower boundaries of the dataset:

    scatter(D(1,:).',D(2,:).');hold on
    plot(bound(:,1),bound(:,2),'--k','linewidth',2)
    

    enter image description here


    EDIT: Here are a few more steps towards a working solution:

    % Start from the point closest to [1,0]
    [~,idx]=min(sqrt((bound(:,1)-1).^2+bound(:,2).^2));
    if idx>1
    
        bound = [bound(idx:end,:);bound(1:(idx-1),:)];
    
    end
    
    % Get the index of the next intersection with the horizontal axis
    % (exclude a few points before and after the initial point)
    npts_exc = 6;
    
    [~,idx2]=min(bound((npts_exc):(end-npts_exc),2));
    
    bound1 = bound(1:(npts_exc+idx2-1),:);
    bound2 = bound((npts_exc+idx2-1):end,:);
    
    % Follow curve 1. for each point on curve 1 get the closest one on
    % curve 2 and then construct the mid point
    bound_avg = zeros(size(bound1));
    
    for ii=1:size(bound1,1)
        
        [~,idx_min]=min(sqrt((bound2(:,1)-bound1(ii,1)).^2+(bound2(:,2)-bound1(ii,2)).^2));
    
        bound_avg(ii,:) = 0.5*(bound1(ii,:)+bound2(idx_min,:));
    
    end
    

    This still runs in around 30ms total. It gives the following results:

    scatter(D(1,:).',D(2,:).');hold on
    plot(bound1(:,1),bound1(:,2),'--k','linewidth',1)
    plot(bound2(:,1),bound2(:,2),'--r','linewidth',1)
    plot(bound_avg(:,1),bound_avg(:,2),'-m','linewidth',2)
    

    enter image description here

    The last thing to do is to resample the curve to get the right length of N points.

    EDIT2 : Final solution

    % Resampling so that the N points on the curve are uniformally positionned
    % on the curve length
    N = 50;
    bound_res = zeros(N,2);
    % Keep endpoints
    bound_res(1,:) = bound_avg(1,:);
    bound_res(end,:) = bound_avg(end,:);
    
    % Cumulative curve length
    c_len = cumsum(sqrt((bound_avg(1:(end-1),1)-bound_avg(2:end,1)).^2+(bound_avg(1:(end-1),2)-bound_avg(2:end,2)).^2));
    % Curve length will have to be unique for interp1 to work
    [C,ia,~] = unique(c_len);
    
    % remove non unique points from bound_avg as well
    bound_avg_un = bound_avg(ia+1,:);
    
    % Target cumulative curve length for uniform distribution
    step_size = c_len(end)/(N-1);
    target_lengths = step_size*(1:(N-1));
    
    % Interpolate on the target lengths
    bound_res(2:(end-1),1) = interp1(C,bound_avg_un(:,1),target_lengths(1:(end-1)));
    bound_res(2:(end-1),2) = interp1(C,bound_avg_un(:,2),target_lengths(1:(end-1)));
    

    enter image description here