I am trying to summarise between my variables in R, and my data looks like this:
id Function V t
1 adverb 0 1
2 discourse 1 1
3 filler 1 0
4 discourse 1 1
5 filler 0 0
6 adverb 1 1
7 adverb 1 1
What I need is a table which lists each function by the count and proportion of each variable (present/non-present).
I Excel I would use something like: COUNTIFS($B:$B, "adverb", $C:$C, ">"&0)/SUMIFS($B:$B, "adverb")
Ideally it looks like this:
Function V V_prop t t_prop
adverb 2 0.67 3 1
discourse 2 1 2 1
filler 1 0.5 0 0
I know I can use dplyr like so:
df %>%
group_by(Function) %>%
dplyr::summarise_at(vars(V,t), function(x) (sum(x !=0)/n(x)))
But, this only gives me the raw counts, I need the proportions too.
You were actually very close. TO get the proportion, just divide by the number of items in the group (with n()
not n(x)
). If you provide a list of functions to a _at
function, it will apply them all to each variable selected:
df %>%
group_by(Function) %>%
summarise_at(vars(V,t),
list('n' = ~ sum(. !=0),
'prop' = ~ (sum(. !=0)/n())))
Function V_n t_n V_prop t_prop
<chr> <int> <int> <dbl> <dbl>
1 adverb 2 3 0.667 1
2 discourse 2 2 1 1
3 filler 1 0 0.5 0
Because of the way dplyr works, the new variables must have an added name (so you can't get V
and t
right away, they must be V_n
and t_n
. If you really want the same format, we can just rename them, either manually or with rename_at
:
... %>%
rename_at(vars(ends_with('_n')), ~ gsub('_n$', '', .))