I'm summarizing a data frame in dplyr with the summarize_all()
function. If I do the following:
summarize_all(mydf, list(mean="mean", median="median", sd="sd"))
I get a tibble with 3 variables for each of my original measures, all suffixed by the type (mean, median, sd). Great! But when I try to capture the within-vector n's to calculate the standard deviations myself and to make sure missing cells aren't counted...
summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="n"))
...I get an error:
Error in (function () : unused argument (var_a)
This is not an issue with my var_a
vector. If I remove it, I get the same error for var_b
, etc. The summarize_all
function is producing odd results whenever I request n
or n()
, or if I use .funs()
and list the descriptives I want to compute instead.
What's going on?
The reason it's giving you problems is because n()
doesn't take any arguments, unlike mean()
and median()
. Use length()
instead to get the desired effect:
summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="length"))