Search code examples
pythonscikit-learncluster-computingcentroid

Problem with NearestCentroid, python, cluster


I want to find the centroid coordinates of a cluster (list of points [x,y]). So, I want to use NearestCentroid() from sklearn.

clf = NearestCentroid()
clf.fit(X, y)

X : np.array of my coordinates points.

y : np.array fully filled with 1

I have an error when I launch the fit() function.

ValueError: y has less than 2 classes

Maybe there is a problem with arrays shape. (X= (7,2) ,y= (7,))


Solution

  • The centroid of points can be calculated by summing up all the values in each dimension and averaging them. You can use numpy.mean() for this. Refer to the documention: numpy.mean

    import numpy as np
    
    points = [
        [0, 0],
        [1, 1],
        [0, 1],
        [0, 100]
    ]
    a = np.array(points)
    centroid = np.mean(a, axis=0)
    print(centroid)
    

    Which will give:

    [ 0.25 25.5 ]
    

    You can verify this by hand. Sum up the x-axis values: 0+1+0+0 = 1 and average it: 1/4. Same for y-axis: 0+1+1+100 = 102, average it: 102/4 = 25.5.