I've reviewed many 'do X using dplyr
based on condition' SO questions, and I believe only this one comes close to what I'm doing. I've tried but cannot manage to adapt @divibisan's answer there.
My data looks like:
df <- tibble(
Group_ID = c(1,2,3,4,5,6),
statusA = c("NEW", "NEW", "OLD","NEW", "OLD","OLD"),
statusB = c("BONUS", NA, "BONUS", NA, NA,"ROLLBACK"),
someVar = c("Thanks", "for", "your", "help","!","!"))
When statusB
is any non-NA value, I want to 'split' (as it were) that row such that for the Group_ID
there exists a distinct row for its statusA
and its statusB
. When rows are 'split', the row with a non-NA value for statusA
should show NA for statusB
, and the row with a non-NA value for statusB
should show NA for statusA
. All remaining variables (i.e., other than statusA
and statusB
) should appear, per each Group_ID
, on each line.
desired_df <- tibble(
Group_ID = c(1,1,2,3,3,4,5,6,6),
statusA = c("NEW",NA,"NEW","OLD",NA,"NEW","OLD","OLD",NA),
statusB = c(NA,"BONUS",NA,NA,"BONUS",NA,NA,NA,"ROLLBACK"),
someVar = c("Thanks","Thanks", "for", "your","your","help","!","!","!"))
I strongly wish to avoid using a loop. Can this be accomplished using dplyr?
You can try:
library(dplyr)
library(tidyr)
df |>
reframe(statusA = c(statusA, NA[!is.na(statusB)]),
statusB = c(NA[!is.na(statusB)], statusB), .by = -c(statusA, statusB)) |>
relocate(c(statusA, statusB), .after = Group_ID)
# A tibble: 9 × 4
Group_ID statusA statusB someVar
<dbl> <chr> <chr> <chr>
1 1 NEW NA Thanks
2 1 NA BONUS Thanks
3 2 NEW NA for
4 3 OLD NA your
5 3 NA BONUS your
6 4 NEW NA help
7 5 OLD NA !
8 6 OLD NA !
9 6 NA ROLLBACK !