Search code examples
pythonnumpypcaeuclidean-distance

calculate euclidean distance for PCA in python


I have PCA with 3D numpy array as

pcar =[[xa ya za]
       [xb yb zb]
       [xc yc zc]
       .
       .
       [xn yn zn]]

where each row is a point and I have selected any two random rows from above PCA as a cluster as

out_list=pcar[numpy.random.randint(0,pcar.shape[0],2)]

which gives numpy array with 2 rows.

I have to find euclidean distance from each row of out_list with each row(point) in pcar and add that pcar point to nearest point in out_list cluster.


Solution

  • There is a really fast implementation in Scipy:

     from scipy.spatial.distance import cdist, pdist
    

    cdist takes two vectors like your pchar one and calculates the distances betweeen each of these points. pdist will give you only the upper triangle of that matrix.

    As they are implemented in C or Fortran behind the scenes, they are very performant.