I have a dataset of course enrollment where I am trying to track whether students dropped, added, or retained a course throughout the semester and identify their enrollment 'path'. I.e. I want to record if they were enrolled in BIOL101 and dropped it to take BIOL202. My dataframe looks like this:
YRTR TECH_ID COU_ID SUBJ COU_NBR GENDER RACE sub_cou status path
20173 108 217 MUSC 2231 Male White MUSC 2231 retained
20173 108 218 MUSC 2281 Male White MUSC 2281 retained
20173 8429 574 ECON 2201 Male White ECON 2201 retained
20173 8429 720 BUSN 2120 Male White BUSN 2120 retained
20173 9883 60 ECON 2202 Male White ECON 2202 added
20173 15515 95 PHIL 1102 Female White PHIL 1102 retained
20183 8207 478 ART 1102 Female White ART 1102 retained
20183 8207 1306 ART 1130 Female White ART 1130 added
20183 8207 403 ART 1125 Female White ART 1125 dropped
I am trying to fill in the column on the far right, "path". The idea is that if a student is retained in a course like in the first row, the path would read 2231->2231
. Specifically I am looking at course transfers WITHIN subjects. So, at the end of the data set, ID 8207 would have one path that looked like 1102->1102
and another path that looked like 1125->1130
I initially tried splitting the dataframe into two dataframes (one before, and one after the drop period) and then rejoining them like so:
data5 <- merge(x=post_drop, y=pre_drop, by=c("TECH_ID", "YRTR", "SUBJ"), all=TRUE)
And then using case_when to assign the path:
data5$status.x=="retained" ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.x),
((data5$status.x=="added") & (data5$status.y=="dropped")) ~ paste0(data5$COU_NBR.y, "->", data5$COU_NBR.x),
((data5$status.x=="dropped") & (data5$status.y=="added")) ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.y)
)
But this doesn't get me where I want - it leaves a lot of NAs in paths and also doesn't tell me if a student dropped a course within a subject and didn't register for another (i.e. dropping BIOL101 and not taking another BIOL class) in which case I would want something like 101->NA
or when a class is simply added (i.e. they weren't registered in a BIOL class initially but decided to register for BIOL101) which would be formatted like so NA->101
EDITED SOLUTION Spetember 27th
Hello again @alexvc Here's a start. Knowing a little bit about your data. You forget the case where a student dropped 1 and added 2 in which case the "path" becomes muddled. I've given you a solution that shows path
clearly.
library(dplyr)
library(tidyr)
data5 %>%
group_by(YRTR, TECH_ID, SUBJ, status) %>%
mutate(numbadd =
case_when(
status == "added" ~ 1,
TRUE ~ 0
),
numbdrop =
case_when(
status == "dropped" ~ 1,
TRUE ~ 0
),
rightside =
case_when(
numbadd == 1 ~ paste(COU_NBR, collapse = " and ")
),
leftside =
case_when(
numbdrop == 1 ~ paste(COU_NBR, collapse = " and ")
)
) %>%
group_by(YRTR, TECH_ID, SUBJ) %>%
mutate(total_add_drop = ifelse(status == "retained",
0,
sum(numbadd) + sum(numbdrop))) %>%
tidyr::fill(leftside, rightside, .direction = "downup") %>%
group_by(YRTR, TECH_ID, SUBJ, status) %>%
mutate(PATH =
case_when(
status == "retained" ~ paste(COU_NBR,
COU_NBR,
sep = " -> "),
status == "added" & total_add_drop == 1 ~ paste("NA",
COU_NBR,
sep = " -> "),
status == "dropped" & total_add_drop == 1 ~ paste(COU_NBR,
"NA",
sep = " -> "),
total_add_drop >= 2 ~ paste(leftside,
rightside,
sep = " -> "),
TRUE ~ "Theres a problem"
)) %>%
arrange(YRTR, TECH_ID) %>%
select(-COU_ID, -GENDER, -RACE, -rightside, -leftside, -numbadd, -numbdrop, -total_add_drop)
#> # A tibble: 17 x 7
#> # Groups: YRTR, TECH_ID, SUBJ, status [13]
#> YRTR TECH_ID SUBJ COU_NBR sub_cou status PATH
#> <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
#> 1 20173 108 MUSC 2231 MUSC 2231 retained 2231 -> 2231
#> 2 20173 108 MUSC 2281 MUSC 2281 retained 2281 -> 2281
#> 3 20173 3889 ECON 2202 ECON 2202 dropped 2202 -> NA
#> 4 20173 8429 ECON 2201 ECON 2201 retained 2201 -> 2201
#> 5 20173 8429 BUSN 2120 BUSN 2120 retained 2120 -> 2120
#> 6 20173 9883 ECON 2202 ECON 2202 added NA -> 2202
#> 7 20173 15515 PHIL 1102 PHIL 1102 retained 1102 -> 1102
#> 8 20183 8207 ART 1102 ART 1102 retained 1102 -> 1102
#> 9 20183 8207 ART 1130 ART 1130 added 1125 -> 1130 and 2345
#> 10 20183 8207 ART 2345 ART 2345 added 1125 -> 1130 and 2345
#> 11 20183 8207 ART 1125 ART 1125 dropped 1125 -> 1130 and 2345
#> 12 20183 8209 ART 2345 ART 2345 added 1125 -> 2345
#> 13 20183 8209 ART 1125 ART 1125 dropped 1125 -> 2345
#> 14 20183 8270 PSYC 1001 PSYC 1001 dropped 1001 and 1002 -> 1003 and 1004
#> 15 20183 8270 PSYC 1003 PSYC 1003 added 1001 and 1002 -> 1003 and 1004
#> 16 20183 8270 PSYC 1002 PSYC 1002 dropped 1001 and 1002 -> 1003 and 1004
#> 17 20183 8270 PSYC 1004 PSYC 1004 added 1001 and 1002 -> 1003 and 1004
Your data with additional test cases
data5 <- readr::read_table(
" YRTR TECH_ID COU_ID SUBJ COU_NBR GENDER RACE sub_cou status
20173 108 217 MUSC 2231 Male White MUSC 2231 retained
20173 108 218 MUSC 2281 Male White MUSC 2281 retained
20173 8429 574 ECON 2201 Male White ECON 2201 retained
20173 8429 720 BUSN 2120 Male White BUSN 2120 retained
20173 9883 60 ECON 2202 Male White ECON 2202 added
20173 3889 60 ECON 2202 Male White ECON 2202 dropped
20173 15515 95 PHIL 1102 Female White PHIL 1102 retained
20183 8207 478 ART 1102 Female White ART 1102 retained
20183 8207 1306 ART 1130 Female White ART 1130 added
20183 8207 1307 ART 2345 Female White ART 2345 added
20183 8207 403 ART 1125 Female White ART 1125 dropped
20183 8270 1306 PSYC 1001 Female Black PSYC 1001 dropped
20183 8270 1307 PSYC 1003 Female Black PSYC 1003 added
20183 8270 403 PSYC 1002 Female Black PSYC 1002 dropped
20183 8209 1307 ART 2345 Female White ART 2345 added
20183 8270 1306 PSYC 1004 Female Black PSYC 1004 added
20183 8209 403 ART 1125 Female White ART 1125 dropped")