Search code examples
riterationmutate

In R, how do I make a long dataset that contains all the categories that a row applies to?


I have a dataset of students, and I want to create a column that categorizes each student based on their and other students' data in their "subject" and "grade_letter" column because I eventually want to create a column that says "math_science_A", "science_B", etc. to know how many students got As, Bs, Cs, etc. based on the unique make ups.

I have a list of all potential combinations that a student can have, but when I try to iterate over R, my case_when only gives me the first category. I then tried to make multiple columns, so I could pivot longer later, but that also seemed to fail. Any help would be appreciated!

#Here's my sample dataset
library(tidyverse)
test <- tibble(student_id = c(1),
               subject = c("math"),
               grade_num = c(95),
               grade = c("a"))



#Here's a sample of what I want (full desired output would be too long to write; see text below code chunks)
sample_of_desired_output <- tibble(student_id = c(1),
               subject = c("math"),
               grade_num = c(95),
               grade = c("a"),
category = c("n_math_a", "n_science_a", "n_english_a", "n_math_science_a", "n_math_english_a"))

#####Here's what I tried
#Getting all combinations--this part works
all_subjects <- c("math", "science", "english")

i = 1
while(i <= length(all_subjects)) {
  
  if(i == 1) {
    all_subject_combos <- c()
  } else {
    some_combos <- combn(all_subjects, i,  FUN = function(x) paste(x, collapse = "_"), simplify = TRUE)
    all_subject_combos <- append(all_subject_combos, some_combos)
  }
  
  i <- i +1
}

all_grades <- c("A", "B", "C", "D", "F")

all_subjects_and_grades <- expand.grid(all_subject_combos, all_grades)

subjects_and_grades_combos <- all_subjects_and_grades %>%
  unite("names", c(Var1, Var2)) %>%
  mutate(names =  tolower(paste0("n_", str_replace_all(names, "\\|", "_")))) %>%
  pull(names)


#This part does not work but is what I tried:

i = 1
while(i <= length(subjects_and_grades_combos)) {
  does_not_work <- test %>%
    mutate("subject_grade_grouping_{{i}}" := case_when(
      str_detect(subjects_and_grades_combos[i], subject) ~ subjects_and_grades_combos[i]))
  i <- i +1
}
does_not_work

Note that in the final, there will be all potential combinations, so each student will have their data repeating many times. After I have this, I will be using this to calculate the number of students who fall into each category. So, in this example, math_a, math_science_a, math_english_a, and math_science_english_a would all have 1. Note that I do not need help calculating that part, and I am flexible with the order of the columns.


Solution

  • The while loops needs to update the original object or else it gets the last iteration update

    library(dplyr)
    library(tidyr)
    test1 <- test
     i = 1
    while(i <= length(subjects_and_grades_combos)) {
      test1 <- test1 %>%
         mutate("subject_grade_grouping_{{i}}" := case_when(
           str_detect(subjects_and_grades_combos[i], subject) ~ subjects_and_grades_combos[i]))
       i <- i +1
     }
    
    test1 %>% 
      pivot_longer(cols = starts_with('subject_grade'),
       names_to = NULL, values_to = "category", values_drop_na = TRUE)
    

    -output

    # A tibble: 15 × 5
       student_id subject grade_num grade category                
            <dbl> <chr>       <dbl> <chr> <chr>                   
     1          1 math           95 a     n_math_science_a        
     2          1 math           95 a     n_math_english_a        
     3          1 math           95 a     n_math_science_english_a
     4          1 math           95 a     n_math_science_b        
     5          1 math           95 a     n_math_english_b        
     6          1 math           95 a     n_math_science_english_b
     7          1 math           95 a     n_math_science_c        
     8          1 math           95 a     n_math_english_c        
     9          1 math           95 a     n_math_science_english_c
    10          1 math           95 a     n_math_science_d        
    11          1 math           95 a     n_math_english_d        
    12          1 math           95 a     n_math_science_english_d
    13          1 math           95 a     n_math_science_f        
    14          1 math           95 a     n_math_english_f        
    15          1 math           95 a     n_math_science_english_f