How can i find the incidences of my attributes in Weka?

I have diagnoses, ages and their quantities. For example,

I have more than 50+ with duplicate diagnoses. I want to see the incidence of this age+quatity relation in new column. How can i do that ?

Solution

If your data is stored in a dataframe called df, try the following:

library(dplyr)
df %>% group_by(diagnosis, age, quatity) %>% summarise(n())

This will give you a data.frame with the number of occurrences for each diagnosis at a given age and for a given "quatity". Please make sure the latter is spelled correctly.

For instance, using the mtcars dataset:

mtcars %>% group_by(cyl, vs, carb) %>% summarise(n())
Source: local data frame [11 x 4]
Groups: cyl, vs [?]

     cyl    vs  carb `n()`
   <dbl> <dbl> <dbl> <int>
1      4     0     2     1
2      4     1     1     5
3      4     1     2     5
4      6     0     4     2
5      6     0     6     1
6      6     1     1     2
7      6     1     4     2
8      8     0     2     4
9      8     0     3     3
10     8     0     4     6
11     8     0     8     1

Here, the first line tells you that there is only one car with cyl = 4, vs = 0, carb = 2, and there are 5 cars with (cyl, vs, carb) = (4, 1, 1). If you want the column added to the old data.frame, use mutate instead of summarise.

These kind of operations are often referred to as split-apply-combine. It is worthwhile reading up on them.

Just for reference: this question used to be "How can i find the incidences of my attributes in R or Weka?" It has been changed to Weka only after I have supplied the answer for R.