Search code examples
rweka

How can i find the incidences of my attributes in Weka?


I have diagnoses, ages and their quantities. For example,

enter image description here

I have more than 50+ with duplicate diagnoses. I want to see the incidence of this age+quatity relation in new column. How can i do that ?


Solution

  • If your data is stored in a dataframe called df, try the following:

    library(dplyr)
    df %>% group_by(diagnosis, age, quatity) %>% summarise(n())
    

    This will give you a data.frame with the number of occurrences for each diagnosis at a given age and for a given "quatity". Please make sure the latter is spelled correctly.

    For instance, using the mtcars dataset:

    mtcars %>% group_by(cyl, vs, carb) %>% summarise(n())
    Source: local data frame [11 x 4]
    Groups: cyl, vs [?]
    
         cyl    vs  carb `n()`
       <dbl> <dbl> <dbl> <int>
    1      4     0     2     1
    2      4     1     1     5
    3      4     1     2     5
    4      6     0     4     2
    5      6     0     6     1
    6      6     1     1     2
    7      6     1     4     2
    8      8     0     2     4
    9      8     0     3     3
    10     8     0     4     6
    11     8     0     8     1
    

    Here, the first line tells you that there is only one car with cyl = 4, vs = 0, carb = 2, and there are 5 cars with (cyl, vs, carb) = (4, 1, 1). If you want the column added to the old data.frame, use mutate instead of summarise.

    These kind of operations are often referred to as split-apply-combine. It is worthwhile reading up on them.


    Just for reference: this question used to be "How can i find the incidences of my attributes in R or Weka?" It has been changed to Weka only after I have supplied the answer for R.