Search code examples
javacluster-analysiswekak-means

Get cluster assignments in Weka


I have a CSV file as follows:

id,at1,at2,at3
1072,0.5,0.2,0.7
1092,0.2,0.5,0.7
...

I've loaded it in in Weka for clustering:

DataSource source = new DataSource("test.csv");
Instances data = source.getDataSet();
kmeans.buildClusterer(data);

Question #1: How do I set the first column as an ID? ie. ignoring the first column for clustering purposes.

I then try to print out the assignments:

int[] assignments = kmeans.getAssignments();
int i = 0;
for (int clusterNum : assignments) {
    System.out.printf("Instance %d -> Cluster %d \n", i, clusterNum);
    i++;
}

This prints:

Instance 1 -> Cluster 0 
Instance 2 -> Cluster 2
...

Question #2: How do I refer to the ID when printing out the assignments? For example:

Instance 1072 -> Cluster 0
Instance 1092 -> Cluster 2

Solution

  • Your life would be much easier if you use Windows version of Weka with GUI.

    In cluster tab there is a button for ignoring attributes like ID.

    And for Id to cluster assignments; after your are done with clustering algorithm you chose, right click the result on left of the screen, then visualize results and then save.