Search code examples
machine-learningweka

how to know which label is predict by weka


I may jave one stupid question, but I'm working with weka to predict the effect of different genes in cancer, something like this

cancer  gene1   gene2  gene3 .... 
yes     0.85    1.23   3.52  ....
no      7.58    6.25   8.91  ....
no      6.52    5.25   9.85  ....
yes     1.23    0.59   0.74  ....
.....

but with cancer yes =25 and cancer no=158 plus 75 genes. My issue is when I've run for example InfoGain or Gainratio, I have my selected attributes or ranked attributes (genes), but how can I say that those genes predict cancer = yes or cancer = no?

Many thanks!


Solution

  • I don't know much about genetics, but how do you know that "the" gene causes cancer? It may well be a lot of interacting genes. How do you account for interactions? - your problem.

    Focusing on formal/technical things. In Weka your class attribute "cancer" needs to be the last/rightmost column, or you set it manually with the select box "(Nom) cancer" each time before you click on the "Start" button.

    You might have a look at the diabetes.arff file that comes with Weka, has a similar structure as your datafile.

    If you want to have an interpretable model, you could also run the decision tree algorithm "J48" (in the "Classify" Tab) and in the properties windowset the minNumObj to a higher value (find an appropriate value by trial and error). This creates flat trees with few levels/decisions/if-statements. Then right click on the run (in the lower left panel of the classify tab) and choose "Visualize Tree".