I have a dataset of students, and I want to create a column that categorizes each student based on their and other students' data in their "subject" and "grade_letter" column because I eventually want to create a column that says "math_science_A", "science_B", etc. to know how many students got As, Bs, Cs, etc. based on the unique make ups.
I have a list of all potential combinations that a student can have, but when I try to iterate over R, my case_when only gives me the first category. I then tried to make multiple columns, so I could pivot longer later, but that also seemed to fail. Any help would be appreciated!
#Here's my sample dataset
library(tidyverse)
test <- tibble(student_id = c(1),
subject = c("math"),
grade_num = c(95),
grade = c("a"))
#Here's a sample of what I want (full desired output would be too long to write; see text below code chunks)
sample_of_desired_output <- tibble(student_id = c(1),
subject = c("math"),
grade_num = c(95),
grade = c("a"),
category = c("n_math_a", "n_science_a", "n_english_a", "n_math_science_a", "n_math_english_a"))
#####Here's what I tried
#Getting all combinations--this part works
all_subjects <- c("math", "science", "english")
i = 1
while(i <= length(all_subjects)) {
if(i == 1) {
all_subject_combos <- c()
} else {
some_combos <- combn(all_subjects, i, FUN = function(x) paste(x, collapse = "_"), simplify = TRUE)
all_subject_combos <- append(all_subject_combos, some_combos)
}
i <- i +1
}
all_grades <- c("A", "B", "C", "D", "F")
all_subjects_and_grades <- expand.grid(all_subject_combos, all_grades)
subjects_and_grades_combos <- all_subjects_and_grades %>%
unite("names", c(Var1, Var2)) %>%
mutate(names = tolower(paste0("n_", str_replace_all(names, "\\|", "_")))) %>%
pull(names)
#This part does not work but is what I tried:
i = 1
while(i <= length(subjects_and_grades_combos)) {
does_not_work <- test %>%
mutate("subject_grade_grouping_{{i}}" := case_when(
str_detect(subjects_and_grades_combos[i], subject) ~ subjects_and_grades_combos[i]))
i <- i +1
}
does_not_work
Note that in the final, there will be all potential combinations, so each student will have their data repeating many times. After I have this, I will be using this to calculate the number of students who fall into each category. So, in this example, math_a, math_science_a, math_english_a, and math_science_english_a would all have 1. Note that I do not need help calculating that part, and I am flexible with the order of the columns.
The while
loops needs to update the original object or else it gets the last iteration update
library(dplyr)
library(tidyr)
test1 <- test
i = 1
while(i <= length(subjects_and_grades_combos)) {
test1 <- test1 %>%
mutate("subject_grade_grouping_{{i}}" := case_when(
str_detect(subjects_and_grades_combos[i], subject) ~ subjects_and_grades_combos[i]))
i <- i +1
}
test1 %>%
pivot_longer(cols = starts_with('subject_grade'),
names_to = NULL, values_to = "category", values_drop_na = TRUE)
-output
# A tibble: 15 × 5
student_id subject grade_num grade category
<dbl> <chr> <dbl> <chr> <chr>
1 1 math 95 a n_math_science_a
2 1 math 95 a n_math_english_a
3 1 math 95 a n_math_science_english_a
4 1 math 95 a n_math_science_b
5 1 math 95 a n_math_english_b
6 1 math 95 a n_math_science_english_b
7 1 math 95 a n_math_science_c
8 1 math 95 a n_math_english_c
9 1 math 95 a n_math_science_english_c
10 1 math 95 a n_math_science_d
11 1 math 95 a n_math_english_d
12 1 math 95 a n_math_science_english_d
13 1 math 95 a n_math_science_f
14 1 math 95 a n_math_english_f
15 1 math 95 a n_math_science_english_f