Search code examples
pythonscikit-learncluster-analysisunsupervised-learning

what is the best algorithm to cluster this data


enter image description here

can some one help me find a good clustering algorithm that will cluster this into 3 clusters without defining the number of clusters.

i have tried many algorithms in its basic form.. nothing seems to work properly.

clustering = AgglomerativeClustering().fit(temp)

same way i tried the dbscan and kmeans too.. just used the guidelines from sklean. i couldn't get the expected results.

my original data set is a 1D list of numbers.. but the order of the numbers matters, so generated a 2D list as bellow.

temp = []
for i in range(len(avgs)):
    temp.append([avgs[i], i+1])
clustering = AgglomerativeClustering().fit(temp)

in plotting piloting i used a similter range as the y axis

ax2.scatter(range(len(plots[i])), plots[i], c=np.random.rand(3,))

the order of the data matters, so this need to clustered into 3. and there might be some other data sets that the data is very good so that the result of that need to be just one cluster.

Link to the list if someone want to try

so i tried using the step detection and got the following image according to ur answer. but how can i find the values of the peaks.. if i get the max value i can get one of them.. but how to get the rest of it.. the second max is not an answer because the one right next to the max is the second max

enter image description here


Solution

  • Your data is not 2d coordinates. So don't choose an algorithm designed for that!

    Instead your data appears to be sequential or time series.

    What you want to use is a change point detection algorithm, capable of detecting a change in the mean value of a series.

    A simple approach would be to compute the sum of the next 10 points minus the sum of the previous 10 points, then look for extreme values of this curve.