Search code examples
javaalgorithmdata-miningcluster-analysisk-means

Implementation of k-means clustering algorithm


In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where i'm making a mistake..?

for simplicity, i hav taken the input in the program code itself. here is my code :

import java.io.*;
import java.lang.*;
class Kmean
{
public static void main(String args[])
{
int N=9;
int arr[]={2,4,10,12,3,20,30,11,25};    // initial data
int i,m1,m2,a,b,n=0;
boolean flag=true;
float sum1=0,sum2=0;
a=arr[0];b=arr[1];
m1=a; m2=b;
int cluster1[]=new int[9],cluster2[]=new int[9];
for(i=0;i<9;i++)
    System.out.print(arr[i]+ "\t");
System.out.println();

do
{
 n++;
 int k=0,j=0;
 for(i=0;i<9;i++)
 {
    if(Math.abs(arr[i]-m1)<=Math.abs(arr[i]-m2))
    {   cluster1[k]=arr[i];
        k++;
    }
    else
    {   cluster2[j]=arr[i];
        j++;
    }
 }
    System.out.println();
    for(i=0;i<9;i++)
        sum1=sum1+cluster1[i];
    for(i=0;i<9;i++)
        sum2=sum1+cluster2[i];
    a=m1;
    b=m2;
    m1=Math.round(sum1/k);
    m2=Math.round(sum2/j);
    if(m1==a && m2==b)
        flag=false;
    else
        flag=true;

    System.out.println("After iteration "+ n +" , cluster 1 :\n");    //printing the clusters of each iteration
    for(i=0;i<9;i++)
        System.out.print(cluster1[i]+ "\t");

    System.out.println("\n");
    System.out.println("After iteration "+ n +" , cluster 2 :\n");
    for(i=0;i<9;i++)
        System.out.print(cluster2[i]+ "\t");

}while(flag);

    System.out.println("Final cluster 1 :\n");            // final clusters
    for(i=0;i<9;i++)
        System.out.print(cluster1[i]+ "\t");

    System.out.println();
    System.out.println("Final cluster 2 :\n");
    for(i=0;i<9;i++)
        System.out.print(cluster2[i]+ "\t");
 }
}

Solution

  • You have a bunch of errors:

    1. At the start of your do loop you should reset sum1 and sum2 to 0.

    2. You should loop until k and j respectively when calculating sum1 and sum2 (or clear cluster1 and cluster2 at the start of your do loop.

    3. In the calculation of sum2 you accidentally use sum1.

    When I make those fixes the code runs fine, yielding the output:

    Final cluster 1 :   
    2   4   10   12  3   11  0   0   0
    
    Final cluster 2 :
    20  30  25   0   0   0   0   0   0
    

    My general advise: learn how to use a debugger. Stackoverflow is not meant for questions like this: it is expected that you can find your own bugs and only come here when everything else fails...