Search code examples
weka

How can we use clustering results in weka ?


I am using Weka for my internship but I have a little knowledge about data mining. So, maybe someone knows how can I apply the following results on my data-sets to get all data by cluster ? The method that I use now is to compute distances between my attributes and the mean value of each cluster then I classify them by the nearest value. But this method is too rough for me .

=== Run information ===

Scheme:weka.clusterers.EM -I 100 -N -1 -M 1.0E-6 -S 100
Relation:     wcet_cluster6 - Copie-weka.filters.unsupervised.attribute.Remove-R1-3,5-weka.filters.unsupervised.attribute.Remove-R5-12
Instances:    467
Attributes:   4
              max
              alt
              stmt
              bb
Test mode:evaluate on training data

=== Model and evaluation on training set ===

EM

Number of clusters selected by cross validation: 6


             Cluster
Attribute          0        1        2        3        4        5
              (0.28)   (0.11)   (0.25)   (0.16)   (0.04)   (0.17)
==================================================================
max
  mean         9.0148  10.9112  11.2826  10.4329  11.2039  10.0546
  std. dev.    1.8418   2.7775   3.0263   2.5743   2.2014   2.4614

alt
  mean         0.0003  19.6467   0.4867   2.4565   44.191   8.0635
  std. dev.    0.0175   5.7685   0.5034   1.3647  10.4761   3.3021

stmt
  mean         0.7295  77.0348   3.2439  12.3971 140.9367  33.9686
  std. dev.    1.0174  21.5897   2.3642   5.1584  34.8366  11.5868

bb
  mean         0.4362  53.9947   1.4895   7.2547 114.7113  22.2687
  std. dev.    0.5153  13.1614   0.9276   3.5122  28.0919   7.6968



Time taken to build model (full training data) : 4.24 seconds

=== Model and evaluation on training set ===

Clustered Instances

0      163 ( 35%)
1       50 ( 11%)
2       85 ( 18%)
3       73 ( 16%)
4       18 (  4%)
5       78 ( 17%)


Log likelihood: -9.09081

Thanks for your help!!


Solution

  • I think no-one can really answer this. Some tips off the top of my head.

    You have used the EM clustering algorithm, see animated gif on wikipedia page. From Weka's Documentation Synopsis:

    "EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. "

    Is this complex output really what you want? It also selects a number of clusters for you (unless you constrain that number).

    In weka 3.7 you can use the unsupervised attribute filter "ClusterMembership" in the Preprocess dialog to replace your dataset with a result of the cluster assignments. You need to select one reference attribute, though. By default it selects the last one. This creates hard-to -interpret output.