Search code examples
pythoncluster-analysis

Calculate euclidean distance between vectors with cluster medoids


I have array consist of 3 vectors that represent 3 objects

X2=array([[ 5.43840675, -1.05259078, -0.21793506,  8.56686818, -2.58056957,
        -0.07310339, -0.31181501,  0.02696586],
       [ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
        -0.04897565, -0.34271698, -0.0339766 ],
       [ 5.93081714, -1.52272427,  0.40706477,  8.56256569, -3.216366  ,
        -0.0108426 , -0.57434619, -0.18952662]])

model1 = KMedoids(n_clusters=2, random_state=0).fit(X2)
    

and cluster labels for them are [1, 0, 0]

medoids are

medoids=array([[ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
        -0.04897565, -0.34271698, -0.0339766 ],
       [ 5.43840675, -1.05259078, -0.21793506,  8.56686818, -2.58056957,
        -0.07310339, -0.31181501,  0.02696586]])
    

I want to calculate the distance for each object in (X2) with each cluster (0,1), for example for object [1] with cluster (0)

 X2[1]=([ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
        -0.04897565, -0.34271698, -0.0339766 ])
medoids[0]=[ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
            -0.04897565, -0.34271698, -0.0339766 ]

the distance (a) should be zero since there is no difference between them.

        a=euclidean_distances(X2[1].reshape(-1, 1), X2[model1.medoid_indices_][0].reshape(-1, 1))
        

Any idea what can be the issue?


Solution

  • The euclidean distance function is working as expected, as it is calculating the distance between each item in the two arrays. In this regard, the euclidean distance matrix is symmetrical.

    import numpy as np
    from sklearn_extra.cluster import KMedoids
    from sklearn.metrics.pairwise import euclidean_distances
    
    
    X2=np.array([[ 5.43840675, -1.05259078, -0.21793506,  8.56686818, -2.58056957,
            -0.07310339, -0.31181501,  0.02696586],
           [ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
            -0.04897565, -0.34271698, -0.0339766 ],
           [ 5.93081714, -1.52272427,  0.40706477,  8.56256569, -3.216366  ,
            -0.0108426 , -0.57434619, -0.18952662]])
    
    model1 = KMedoids(n_clusters=2, random_state=0).fit(X2)
    
    medoids=np.array([[ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
            -0.04897565, -0.34271698, -0.0339766 ],
           [ 5.43840675, -1.05259078, -0.21793506,  8.56686818, -2.58056957,
            -0.07310339, -0.31181501,  0.02696586]])
    
    X2[1]=([ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
            -0.04897565, -0.34271698, -0.0339766 ])
    
    medoids[0]=[ 5.72318296, -0.99665473, -0.14540062,  8.32051008, -3.36201189,
                -0.04897565, -0.34271698, -0.0339766 ]
    
    a = (X2[1].reshape(-1, 1))
    b = (X2[model1.medoid_indices_][0].reshape(-1, 1))
    
    # dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
    dist =euclidean_distances(a, b)
    print(dist)
    

    This is what you would see:

    [[ 0.          6.71983769  5.86858358  2.59732712  9.08519485  5.77215861
       6.06589994  5.75715956]
     [ 6.71983769  0.          0.85125411  9.31716481  2.36535716  0.94767908
       0.65393775  0.96267813]
     [ 5.86858358  0.85125411  0.          8.4659107   3.21661127  0.09642497
       0.19731636  0.11142402]
     [ 2.59732712  9.31716481  8.4659107   0.         11.68252197  8.36948573
       8.66322706  8.35448668]
     [ 9.08519485  2.36535716  3.21661127 11.68252197  0.          3.31303624
       3.01929491  3.32803529]
     [ 5.77215861  0.94767908  0.09642497  8.36948573  3.31303624  0.
       0.29374133  0.01499905]
     [ 6.06589994  0.65393775  0.19731636  8.66322706  3.01929491  0.29374133
       0.          0.30874038]
     [ 5.75715956  0.96267813  0.11142402  8.35448668  3.32803529  0.01499905
       0.30874038  0.        ]]