I have a long dataset with students grades and subjects. I want to keep a long dataset, but I want to add a column that tells me how many Fs a student had in their humanity courses (English & History) and their STEM courses (biology & math). I also want the same for Ds, Cs, Bs, and As.
I know I could explicitly spell this out, but in the future, they might have other subjects (like adding Chemistry to STEM) or completely different categories, like Foreign Languages, so I want it to be scalable.
I know how to get all combinations of columns, and I know how to do to each part manually--but I don't know how to combine the two. Any help would be greatly appreciated!
#Sample data
library(tidyverse)
student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))
#All combinations of grades and subjects
all_subject_combos <- c("eng|his", "bio|math")
all_grades <- c("F", "D", "C",
"B", "A")
subjects_and_letter_grades <- expand.grid(all_subject_combos, all_grades)
all_combos <- subjects_and_letter_grades %>%
unite("names", c(Var1, Var2)) %>%
mutate(names = str_replace_all(names, "\\|", "_")) %>%
pull(names)
#Manual generation of numbers of Fs by subject
#This is what I want the results to look like, but with all other letter grades
student_grades %>%
group_by(student_id) %>%
mutate(eng_his_F = sum((case_when(
str_detect(subject, "eng|his") & grade == 1 ~ 1,
TRUE ~ 0)), na.rm = TRUE),
bio_math_F = sum((case_when(
str_detect(subject, "bio|math") & grade == 1 ~ 1,
TRUE ~ 0)), na.rm = TRUE)) %>%
ungroup()
Ideally, this would be scalable for any number of subject combos and wouldn't require me to write out the same code for Ds, Cs, Bs, and As. Thank you!
We may loop over the all_combos
vector with map
, then within each list
, do the grouping by 'student_id' (could also do this outside the loop and create an object to use this within here), create the new column with same name as looped by evaluating (!!
) and using :=
operator on the sum
of the output from case_when
and bind the data with the original data
library(dplyr)
library(purrr)
library(stringr)
map_dfc(all_combos, ~ student_grades %>%
group_by(student_id) %>%
transmute(!! .x := sum(case_when(str_detect(subject,
str_replace(.x, "(\\w+)_(\\w+)_.", "\\1|\\2")) &
grade == match(str_extract(.x, ".$"), all_grades)~ 1, TRUE ~ 0))) %>%
ungroup %>%
dplyr::select(-student_id)) %>%
bind_cols(student_grades, .)
-output
# A tibble: 18 × 13
student_id subject grade eng_his_F bio_math_F eng_his_D bio_math_D eng_his_C bio_math_C eng_his_B bio_math_B eng_hi…¹ bio_m…²
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 english 1 1 0 0 1 0 1 1 0 0 0
2 1 biology 2 1 0 0 1 0 1 1 0 0 0
3 1 math 3 1 0 0 1 0 1 1 0 0 0
4 1 history 4 1 0 0 1 0 1 1 0 0 0
5 2 english 5 0 0 1 0 0 1 0 1 1 0
6 2 biology 4 0 0 1 0 0 1 0 1 1 0
7 2 math 3 0 0 1 0 0 1 0 1 1 0
8 2 history 2 0 0 1 0 0 1 0 1 1 0
9 3 english 2 1 1 1 0 0 0 0 1 0 0
10 3 biology 4 1 1 1 0 0 0 0 1 0 0
11 3 math 1 1 1 1 0 0 0 0 1 0 0
12 3 history 1 1 1 1 0 0 0 0 1 0 0
13 4 english 1 1 1 0 1 1 0 0 0 0 0
14 4 biology 1 1 1 0 1 1 0 0 0 0 0
15 4 math 2 1 1 0 1 1 0 0 0 0 0
16 4 history 3 1 1 0 1 1 0 0 0 0 0
17 5 <NA> 3 0 0 0 0 0 0 0 1 0 0
18 5 biology 4 0 0 0 0 0 0 0 1 0 0
# … with abbreviated variable names ¹eng_his_A, ²bio_math_A