Search code examples
matlabcluster-analysisk-means

Understanding K-means Clustering


I'm trying to learn k-means clustering algorithm using Matlab. The problem is I cannot find any sample data that it will make it easier to understand the algorithm well. However, I find an example on mathworks which speciying the k-means clustering. But unfortunately,I cannot under stand it. I tried to understand this simple data-set which I found on Stack-overflow .

Please, I need a basic example on the k-means clustering, which if I implemented it on any software(i.e. matlab) I will be assure that I applying it correctly.

Finally, All the data-sets on the on the UCI for example are too large and I cannot figure if my implementation is correct or not.

Thanks in Advance.


Solution

  • Well,

    let k={2,3,4,10,11,12,20,25,30} 
    

    That's very simple. Lets take k into two data sets, pick two random numbers from each. I took 10 from k1 , 20 from k2 and arranged these two numbers in a way that what numbers are closer to 10 as a data set and numbers closer to 20 as another data set.. Remember you can choose any number.

    k1={2,3,4,10,11,12},k2={20,25,30}
    

    So distribute the big dataset into two and split them according to the nearest numbers. The first one will be the sum of all numbers/total number of digits, same for second.

    {2+3+..+12}/6 = 7 
    {20+25+30)/3= 25.. 
    

    No matter how many iterations, the answer will be the same. This is called the threshold of mean where we get to the saturated point where there will be no change in it. So if you get different numbers keep performing the mean until you reach saturation.