I'm trying to count how many students received 1s, 2s, 3s, 4s, and 5s across their subjects, and I want a column for each subject and the possible grade (math_1, science_2, etc.).
I originally wrote a for loop, but my actual dataset has so many cases that I need to use map. I can get it to work, but it produces many NAs and only one chunk per column has actual data. I'm curious to know either:
Here's my code
library(tidyverse)
#Set up - generate sample dataset and get all combinations of grades and subjects
student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
grade = as.character(c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4)))
all_subject_combos <- c("english", "history", "math", "biology")
all_grades <- c("1", "2", "3",
"4", "5")
subjects_and_letter_grades <- expand.grid(all_subject_combos, all_grades)
all_combos <- subjects_and_letter_grades %>%
unite("names", c(Var1, Var2)) %>%
mutate(names = str_replace_all(names, "\\|", "_")) %>%
pull(names)
# iterate over each combination using map_df()
student_map <- map_df(all_combos,
~student_grades %>%
mutate("{.x}" := paste(i)) %>%
group_by(student_id) %>%
mutate("{.x}" := sum(case_when(str_detect(.x, subject) &
str_detect(.x, grade) ~ 1,
TRUE ~ 0), na.rm = T)))
EDIT For the record, my almost identical for loop does not pad in many missing values. I assume it must have something to do with how it is building the dataset, but I don't know how I can override what map_df is doing under the hood.
student_map <- student_grades
for(i in all_combos) {
student_map <- student_map %>%
mutate("{i}" := paste(i)) %>%
group_by(student_id) %>%
mutate("{i}" := sum(case_when(str_detect(i, subject) &
str_detect(i, grade) ~ 1,
TRUE ~ 0), na.rm = T))
}
There is no i
in the map
as the default lambda value looped is .x
. Also, it is better to use transmute
instead of mutate
as we need to return only the columns added in each iteration and then we bind with the original data at the end
library(dplyr)
library(purrr)
library(stringr)
student_map2 <- map_dfc(all_combos,
~ student_grades %>%
transmute(subject, grade, student_id, "{.x}" := .x) %>%
group_by(student_id) %>%
transmute("{.x}" := sum(case_when(str_detect( .x, subject) &
str_detect(.x, grade)~ 1, TRUE ~ 0), na.rm = TRUE)) %>%
ungroup %>%
select(-student_id)) %>%
bind_cols(student_grades, .)
-checking with OP's for loop output
> all.equal(student_map, student_map2, check.attributes = FALSE)
[1] TRUE