I have a status_df
with id and status at each stage:
id | stage | status |
---|---|---|
15 | 1 | Pending |
15 | 2 | Not Sent |
16 | 1 | Approved |
16 | 2 | Rejected |
16 | 3 | Not Sent |
16 | 4 | Not Sent |
20 | 1 | Approved |
20 | 2 | Approved |
20 | 3 | Approved |
I am trying to do a group_by ID and apply the following logic:
any
stage for an ID has 'Pending' status, final_status column is 'Pending'any
stage for an ID has 'Rejected' status, final_status column is 'Rejected'all
stages for an ID are approved, final_status column is 'Approved'I am trying this (not working):
final_status_df = status_df %>% select(id, status) %>% group_by(id) %>%
mutate(final_status = case_when(any(status)=="Pending" ~ "Pending",
any(status)=="Rejected" ~ "Rejected",
all(status)=="Approved" ~ "Approved"))
Expected output (final_status_df):
id | final_status |
---|---|
15 | Pending |
16 | Rejected |
20 | Approved |
You were in the right direction with your attempt however, you closed any
/all
brackets early before comparison (==
). Also since you only want 1 row for every id
you can use summarise
instead of mutate
which will also avoid the use of select
.
library(dplyr)
status_df %>%
group_by(id) %>%
summarise(final_status = case_when(any(status == "Pending") ~ "Pending",
any(status == "Rejected") ~ "Rejected",
all(status == "Approved") ~ "Approved"))
# id final_status
#* <int> <chr>
#1 15 Pending
#2 16 Rejected
#3 20 Approved