What seems like a straightforward calculation of group means and standard errors doesn't work and I can't figure out why. I get NA
for the std. errors
data("AddHealth", package = "heplots")
library(dplyr)
# use a function
std.error <- function(x, na.rm = TRUE) {
sd(x, na.rm = na.rm) / sqrt(length(x))
}
# try it two ways
AddHealth |>
group_by(grade) |>
summarise(depression = mean(depression),
anxiety = mean(anxiety),
n = n(),
se.dep = std.error(depression),
se.anx = sd(anxiety, na.rm = TRUE) / sqrt(n)) |>
print()
I get NA
for the standard error whether I use my function or do the calculation directly.
This should be relatively simple. Why doesn't this work? How can I get what I want?
# A tibble: 6 × 6
grade depression anxiety n se.dep se.anx
<ord> <dbl> <dbl> <int> <dbl> <dbl>
1 7 0.881 0.751 622 NA NA
2 8 1.08 0.804 664 NA NA
3 9 1.17 0.934 778 NA NA
4 10 1.27 0.956 817 NA NA
5 11 1.37 1.12 790 NA NA
6 12 1.34 1.10 673 NA NA
The issue is that you compute the means first and store the means using the names of the original variable. Hence, when computing the standard errors you are not computing the sd
of the original variable but the sd
of the mean, i.e. you are doing e.g. sd(mean(depression))
which returns NA
.
To fix your issue change the order of the computations, i.e. compute the standard errors first or use different names for the means or what I would prefer use across()
:
data("AddHealth", package = "heplots")
library(dplyr, warn=FALSE)
AddHealth |>
group_by(grade) |>
summarise(
n = n(),
se.dep = sd(depression, na.rm = TRUE) / sqrt(n),
se.anx = sd(anxiety, na.rm = TRUE) / sqrt(n),
depression = mean(depression),
anxiety = mean(anxiety)
)
#> # A tibble: 6 × 6
#> grade n se.dep se.anx depression anxiety
#> <ord> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 7 622 0.0447 0.0420 0.881 0.751
#> 2 8 664 0.0461 0.0411 1.08 0.804
#> 3 9 778 0.0426 0.0387 1.17 0.934
#> 4 10 817 0.0431 0.0388 1.27 0.956
#> 5 11 790 0.0428 0.0411 1.37 1.12
#> 6 12 673 0.0439 0.0426 1.34 1.10
AddHealth |>
group_by(grade) |>
summarise(
n = n(),
mean.dep = mean(depression),
mean.anx = mean(anxiety),
se.dep = sd(depression, na.rm = TRUE) / sqrt(n),
se.anx = sd(anxiety, na.rm = TRUE) / sqrt(n)
)
#> # A tibble: 6 × 6
#> grade n mean.dep mean.anx se.dep se.anx
#> <ord> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 7 622 0.881 0.751 0.0447 0.0420
#> 2 8 664 1.08 0.804 0.0461 0.0411
#> 3 9 778 1.17 0.934 0.0426 0.0387
#> 4 10 817 1.27 0.956 0.0431 0.0388
#> 5 11 790 1.37 1.12 0.0428 0.0411
#> 6 12 673 1.34 1.10 0.0439 0.0426
AddHealth |>
summarise(
n = n(),
across(
c(depression, anxiety),
list(mean = mean, se = ~ sd(.x, na.rm = TRUE) / sqrt(n)),
.names = "{.fn}.{substr(.col, 1, 3)}"
),
.by = grade
)
#> grade n mean.dep se.dep mean.anx se.anx
#> 1 11 790 1.3658228 0.04282760 1.1151899 0.04113153
#> 2 10 817 1.2705018 0.04311396 0.9559364 0.03881506
#> 3 12 673 1.3387816 0.04392934 1.0965825 0.04259964
#> 4 7 622 0.8810289 0.04468291 0.7508039 0.04203107
#> 5 8 664 1.0753012 0.04611600 0.8042169 0.04106970
#> 6 9 778 1.1748072 0.04258433 0.9344473 0.03870629