I have diagnoses, ages and their quantities. For example,
I have more than 50+ with duplicate diagnoses. I want to see the incidence of this age+quatity relation in new column. How can i do that ?
If your data is stored in a dataframe called df
, try the following:
library(dplyr)
df %>% group_by(diagnosis, age, quatity) %>% summarise(n())
This will give you a data.frame
with the number of occurrences for each diagnosis at a given age and for a given "quatity". Please make sure the latter is spelled correctly.
For instance, using the mtcars
dataset:
mtcars %>% group_by(cyl, vs, carb) %>% summarise(n())
Source: local data frame [11 x 4]
Groups: cyl, vs [?]
cyl vs carb `n()`
<dbl> <dbl> <dbl> <int>
1 4 0 2 1
2 4 1 1 5
3 4 1 2 5
4 6 0 4 2
5 6 0 6 1
6 6 1 1 2
7 6 1 4 2
8 8 0 2 4
9 8 0 3 3
10 8 0 4 6
11 8 0 8 1
Here, the first line tells you that there is only one car with cyl = 4, vs = 0, carb = 2
, and there are 5 cars with (cyl, vs, carb) = (4, 1, 1)
. If you want the column added to the old data.frame
, use mutate
instead of summarise
.
These kind of operations are often referred to as split-apply-combine
. It is worthwhile reading up on them.
Just for reference: this question used to be "How can i find the incidences of my attributes in R or Weka?" It has been changed to Weka only after I have supplied the answer for R.