I have a dataframe with two column: PathGroupStage, ClinGroupStage. I want to create a new column, OutputStage, that chooses the higher stage.
Valid value of stage: I, IA, IB, II, IIA, IIB, III, IIIA, IIIB, IIIC ,IV, IVA, IVB, IVC, Unknown.
How would I derive the OutputStage variable comparing the non-numeric values from the two columns? I am thinking I need to factor levels but how would I compare the factors between different columns?
Here is the sample dataset:
PathGroupStage ClinGroupStage
1 II <NA>
2 I IA
3 IVB IVB
4 IIIA Unknown/Not Reported
5 I III
6 II <NA>
7 IIIA IIB
8 II II
9 <NA> <NA>
10 IIIB Unknown/Not Reported
df <- structure(list(PathGroupStage = c("II", "I", "IVB", "IIIA", "I",
"II", "IIIA", "II", NA, "IIIB"), ClinGroupStage = c(NA, "IA",
"IVB", "Unknown/Not Reported", "III", NA, "IIB", "II", NA, "Unknown/Not Reported"
)), row.names = c(NA, 10L), class = "data.frame")
One option could be:
stages <- c("Unknown/Not Reported", "I", "IA", "IB", "II", "IIA", "IIB", "III", "IIIA", "IIIB", "IIIC" ,"IV", "IVA", "IVB", "IVC")
df %>%
mutate(across(everything(), ~ factor(., levels = stages, ordered = TRUE)),
OutputStage = pmax(PathGroupStage, ClinGroupStage, na.rm = TRUE))
PathGroupStage ClinGroupStage OutputStage
1 II <NA> II
2 I IA IA
3 IVB IVB IVB
4 IIIA Unknown/Not Reported IIIA
5 I III III
6 II <NA> II
7 IIIA IIB IIIA
8 II II II
9 <NA> <NA> <NA>
10 IIIB Unknown/Not Reported IIIB