We have the DF
df <- data.frame(group=as.factor(rep(c("UP","DOWN"),6)),variables=(rep(c("sex","smoke","sport"),each=4))
,values=as.factor(c(1,1,1,0 ,1,1,0,0, 1,1,1,1)))
group variables values
1 UP sex 1
2 DOWN sex 1
3 UP sex 1
4 DOWN sex 0
5 UP smoke 1
6 DOWN smoke 1
7 UP smoke 0
8 DOWN smoke 0
9 UP sport 1
10 DOWN sport 1
11 UP sport 1
12 DOWN sport 1
>
Now I want to know all counts of all levels
library(plyr)
This command does almost perfectly what I want
count(df, c("variables", "group", "values"))
variables group values freq
1 sex DOWN 0 1
2 sex DOWN 1 1
3 sex UP 1 2
4 smoke DOWN 0 1
5 smoke DOWN 1 1
6 smoke UP 0 1
7 smoke UP 1 1
8 sport DOWN 1 2
9 sport UP 1 2
I also want to count the factor levels that arent observed. Like I did per hand in the following output.
variables group values freq
1 sex DOWN 0 1
2 sex DOWN 1 1
3 sex UP 0 0 <--
4 sex UP 1 2
5 smoke DOWN 0 1
6 smoke DOWN 1 1
7 smoke UP 0 1
8 smoke UP 1 1
9 sport DOWN 0 0 <--
10 sport DOWN 1 2
11 sport UP 0 0 <--
12 sport UP 1 2
How can I achieve the above output?
You can do this also with data.table
with less lines of code:
library(data.table)
dt <- setDT(df)
cj <- CJ(dt$variables, dt$group, dt$values, unique = TRUE)
dt[, .N, keyby = c("variables", "group", "values")][cj][is.na(N), N := 0]
print(dt)
variables group values N
1: sex DOWN 0 1
2: sex DOWN 1 1
3: sex UP 0 0
4: sex UP 1 2
5: smoke DOWN 0 1
6: smoke DOWN 1 1
7: smoke UP 0 1
8: smoke UP 1 1
9: sport DOWN 0 0
10: sport DOWN 1 2
11: sport UP 0 0
12: sport UP 1 2
setDT()
converts a data.frame
to a data.table
by reference, ie, without copying.
CJ()
is a cross join. It forms a data.table
from the cross product of the vectors. Thus, it's the data.table
version of expand.grid
.
The parameter unique = TRUE
is a convenient alternative to wrapping each argument in level()
or unique()
.
The counting by groups is done with dt[, .N, keyby = c("variables", "group", "values")]
:
variables group values N
1: sex DOWN 0 1
2: sex DOWN 1 1
3: sex UP 1 2
4: smoke DOWN 0 1
5: smoke DOWN 1 1
6: smoke UP 0 1
7: smoke UP 1 1
8: sport DOWN 1 2
9: sport UP 1 2
Now, dt[, .N, keyby = c("variables", "group", "values")][cj]
(right) joins the CJ()
result with all possible combinations.
Finally, [is.na(N), N := 0]
replaces all NA
s in column N
by 0
.