Let's say I have a dataset from a regular school in which students from different living areas are tested in math, English, and science. You need to do a retest if your score is 1SD below the mean and you'll fail if your score is 2SD below the mean.
I can easily compute the means, standard deviation, and these cutoffs. I'm using the nest
from the tidyverse
package. However, I would like to discover how many students were 1SD below and 2SD below the mean.
However, I don't know how to do these count
calculations to these results in an easy way.
Please check the dataset and the code I'm using to achieve the descriptive results:
library(tidyverse)
set.seed(123)
ds <- data.frame(quest = c(2,4,6),
living_area = c("rural","urban","mixed"),
math_sum = rnorm(120, 10,1),
english_sum = rnorm(120, 10,1),
science_sum = rnorm(120, 10,1)
)
ds %>%
select(quest, ends_with("sum")) %>% #get variable names
pivot_longer(-quest) %>% #tranform into long format
nest_by(quest, name) %>% #nest
mutate(
n = map_dbl(data, ~nrow(data.frame(.))), #compute sample size
mean = map_dbl(data, ~mean(.)), #get the means
sd = map_dbl(data, ~sd(.)), #get sd
below = mean-sd, #1 below
failed = mean-2*sd)
ds %>%
filter(quest == 2 & english_sum <= 9.19) %>% nrow()
ds %>%
filter(quest == 2 & english_sum <= 9.39) %>% nrow()
ds %>%
filter(quest == 2 & english_sum <= 8.73) %>% nrow()
We can use data
column to see how many students are below one and two sd.
adding this two lines to the mutate
call:
oneSd_below = sum((mean - sd) > data[[1]]),
twoSd_below = sum((mean - 2*sd) > data[[1]])
library(tidyverse)
set.seed(123)
ds <- data.frame(quest = c(2,4,6),
living_area = c("rural","urban","mixed"),
math_sum = rnorm(120, 10,1),
english_sum = rnorm(120, 10,1),
science_sum = rnorm(120, 10,1)
) %>% as_tibble()
ds %>%
select(quest, ends_with("sum")) %>% #get variable names
pivot_longer(-quest) %>% #tranform into long format
nest_by(quest, name) %>%
mutate(
n = map_dbl(data, ~ nrow(data.frame(.))),
#compute sample size
mean = map_dbl(data, ~ mean(.)),
#get the means
sd = map_dbl(data, ~ sd(.)),
#get sd
below = mean - sd,
#1 below
failed = mean - 2 * sd,
oneSd_below = sum((mean - sd) > data[[1]]),
twoSd_below = sum((mean - 2*sd) > data[[1]])
)
#> # A tibble: 9 × 10
#> # Rowwise: quest, name
#> quest name data n mean sd below failed oneSd_below twoSd_below
#> <dbl> <chr> <list<ti> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
#> 1 2 englis… [40 × 1] 40 10.0 0.839 9.19 8.35 6 0
#> 2 2 math_s… [40 × 1] 40 10.2 0.805 9.39 8.59 7 0
#> 3 2 scienc… [40 × 1] 40 9.92 1.19 8.73 7.54 8 0
#> 4 4 englis… [40 × 1] 40 10.0 1.08 8.94 7.87 6 0
#> 5 4 math_s… [40 × 1] 40 9.90 0.870 9.03 8.16 6 0
#> 6 4 scienc… [40 × 1] 40 9.96 0.882 9.07 8.19 6 1
#> 7 6 englis… [40 × 1] 40 9.87 1.03 8.83 7.80 7 0
#> 8 6 math_s… [40 × 1] 40 9.95 0.992 8.96 7.96 6 1
#> 9 6 scienc… [40 × 1] 40 10.4 0.967 9.41 8.44 5 1
Created on 2021-12-25 by the reprex package (v2.0.1)