I have a CSV file as follows:
id,at1,at2,at3
1072,0.5,0.2,0.7
1092,0.2,0.5,0.7
...
I've loaded it in in Weka for clustering:
DataSource source = new DataSource("test.csv");
Instances data = source.getDataSet();
kmeans.buildClusterer(data);
Question #1: How do I set the first column as an ID? ie. ignoring the first column for clustering purposes.
I then try to print out the assignments:
int[] assignments = kmeans.getAssignments();
int i = 0;
for (int clusterNum : assignments) {
System.out.printf("Instance %d -> Cluster %d \n", i, clusterNum);
i++;
}
This prints:
Instance 1 -> Cluster 0
Instance 2 -> Cluster 2
...
Question #2: How do I refer to the ID when printing out the assignments? For example:
Instance 1072 -> Cluster 0
Instance 1092 -> Cluster 2
Your life would be much easier if you use Windows version of Weka with GUI.
In cluster tab there is a button for ignoring attributes like ID.
And for Id to cluster assignments; after your are done with clustering algorithm you chose, right click the result on left of the screen, then visualize results and then save.