I am using Weka for my internship but I have a little knowledge about data mining. So, maybe someone knows how can I apply the following results on my data-sets to get all data by cluster ? The method that I use now is to compute distances between my attributes and the mean value of each cluster then I classify them by the nearest value. But this method is too rough for me .
=== Run information ===
Scheme:weka.clusterers.EM -I 100 -N -1 -M 1.0E-6 -S 100
Relation: wcet_cluster6 - Copie-weka.filters.unsupervised.attribute.Remove-R1-3,5-weka.filters.unsupervised.attribute.Remove-R5-12
Instances: 467
Attributes: 4
max
alt
stmt
bb
Test mode:evaluate on training data
=== Model and evaluation on training set ===
EM
Number of clusters selected by cross validation: 6
Cluster
Attribute 0 1 2 3 4 5
(0.28) (0.11) (0.25) (0.16) (0.04) (0.17)
==================================================================
max
mean 9.0148 10.9112 11.2826 10.4329 11.2039 10.0546
std. dev. 1.8418 2.7775 3.0263 2.5743 2.2014 2.4614
alt
mean 0.0003 19.6467 0.4867 2.4565 44.191 8.0635
std. dev. 0.0175 5.7685 0.5034 1.3647 10.4761 3.3021
stmt
mean 0.7295 77.0348 3.2439 12.3971 140.9367 33.9686
std. dev. 1.0174 21.5897 2.3642 5.1584 34.8366 11.5868
bb
mean 0.4362 53.9947 1.4895 7.2547 114.7113 22.2687
std. dev. 0.5153 13.1614 0.9276 3.5122 28.0919 7.6968
Time taken to build model (full training data) : 4.24 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 163 ( 35%)
1 50 ( 11%)
2 85 ( 18%)
3 73 ( 16%)
4 18 ( 4%)
5 78 ( 17%)
Log likelihood: -9.09081
Thanks for your help!!
I think no-one can really answer this. Some tips off the top of my head.
You have used the EM clustering algorithm, see animated gif on wikipedia page. From Weka's Documentation Synopsis:
"EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. "
Is this complex output really what you want? It also selects a number of clusters for you (unless you constrain that number).
In weka 3.7 you can use the unsupervised attribute filter "ClusterMembership" in the Preprocess dialog to replace your dataset with a result of the cluster assignments. You need to select one reference attribute, though. By default it selects the last one. This creates hard-to -interpret output.