Search code examples
rgroup-bycase-whenany

Use Group by with mutate, case_when, any() and all() function in R


I have a status_df with id and status at each stage:

id stage status
15 1 Pending
15 2 Not Sent
16 1 Approved
16 2 Rejected
16 3 Not Sent
16 4 Not Sent
20 1 Approved
20 2 Approved
20 3 Approved

I am trying to do a group_by ID and apply the following logic:

  • if any stage for an ID has 'Pending' status, final_status column is 'Pending'
  • if any stage for an ID has 'Rejected' status, final_status column is 'Rejected'
  • if all stages for an ID are approved, final_status column is 'Approved'

I am trying this (not working):

final_status_df = status_df %>% select(id, status) %>% group_by(id) %>%
mutate(final_status = case_when(any(status)=="Pending" ~ "Pending",
any(status)=="Rejected" ~ "Rejected", 
all(status)=="Approved" ~ "Approved"))

Expected output (final_status_df):

id final_status
15 Pending
16 Rejected
20 Approved

Solution

  • You were in the right direction with your attempt however, you closed any/all brackets early before comparison (==). Also since you only want 1 row for every id you can use summarise instead of mutate which will also avoid the use of select.

    library(dplyr)
    
    status_df %>% 
      group_by(id) %>%
      summarise(final_status = case_when(any(status == "Pending") ~ "Pending",
                                         any(status == "Rejected") ~ "Rejected", 
                                         all(status == "Approved") ~ "Approved"))
    
    #    id final_status
    #* <int> <chr>       
    #1    15 Pending     
    #2    16 Rejected    
    #3    20 Approved