Search code examples
matlabcartesianeuclidean-distance

Alternative to using squareform (Matlab)


At the moment i am using the pdist function in Matlab, to calculate the euclidian distances between various points in a three dimensional cartesian system. I'm doing this because i want to know which point has the smallest average distance to all the other points (the medoid). The syntax for pdist looks like this:

% calculate distances between all points
distances = pdist(m);

But because pdist returns a one dimensional array of distances, there is no easy way to figure out which point has the smallest average distance (directly). Which is why i am using squareform and then calculating the smallest average distance, like so:

% convert found distances to matrix of distances
distanceMatrix = squareform(distances);

% find index of point with smallest average distance
[~,j] = min(mean(distanceMatrix,2));

The distances are averaged for each column, and the variable j is the index for the column (and the point) with the smallest average distance.

This works, but squareform takes a lot of time (this piece of code is repeated thousands of times), so i am looking for a way to optimise it. Does anyone know of a faster way to deduce the point with the smallest average distance from the results of pdist?


Solution

  • I think for your task using SQUAREFORM function is the best way from vectorization view point. If you look at the content of this function by

    edit squareform
    

    You will see that it performs a lot of checks that take time of course. Since you know your input to squareform and can be sure it will work, you can create your custom function with just the core of squareform.

    [r, c] = size(m);
    distanceMatrix = zeros(r);
    distanceMatrix(tril(true(r),-1)) = distances;
    distanceMatrix = distanceMatrix + distanceMatrix';
    

    Then run the same code as you did to find the medioid.