Search code examples
rconditional-statementstransitiondplyr

Mutate according to conditions on multiple rows in R


How can I evaluate a condition on multiple rows in dplyr? I have a dataset which I want to mutate according to conditions happening (transitions) on multiple time periods.

Following the example below if one passes by a bad status it has to be treated as bad overall. I tried mutate_ifbut it does not work, or probably I am not able to understand the syntax

df <-
  data.frame(ID = c(1,1,1,2,2,2,3,3,3),
             Date= c(1,2,3,1,2,3,1,2,3),
             Money = c(500,400,500,100,100,100,200,300,300), 
             Status = c("Good", "Bad", "Good", "Good","Good","Good", "Bad","Good","Good"))

Could you provide me a solution to reach the below results? I would prefer staying inside the boundaries of dplyr if possible, although I know some good treatment could be performed with datatable

result <- 
  data.frame(ID = c(1,1,1,2,2,2,3,3,3), 
                     Date= c(1,2,3,1,2,3,1,2,3),
                     Money = c(500,400,500,100,100,100,200,300,300),
                     Status = c("Good", "Bad", "Good", "Good","Good","Good", "Bad","Good","Good"),
                     Status_overall = c("Bad", "Bad", "Bad", "Good","Good","Good", "Bad","Bad","Bad"))

Solution

  • You can return 'Bad' if any Status is 'Bad' in a ID.

    library(dplyr)
    
    df %>%
      group_by(ID) %>%
      mutate(Status_overall = if(any(Status == 'Bad')) 'Bad' else 'Good')
      #Without if/else
      #mutate(Status_overall = c('Good', 'Bad')[any(Status == 'Bad') + 1])
    
    #    ID  Date Money Status Status_overall
    #  <dbl> <dbl> <dbl> <chr>  <chr>         
    #1     1     1   500 Good   Bad           
    #2     1     2   400 Bad    Bad           
    #3     1     3   500 Good   Bad           
    #4     2     1   100 Good   Good          
    #5     2     2   100 Good   Good          
    #6     2     3   100 Good   Good          
    #7     3     1   200 Bad    Bad           
    #8     3     2   300 Good   Bad           
    #9     3     3   300 Good   Bad           
    

    This can be written in base R and data.table as :

    df$Status_overall <- with(df, ifelse(ave(Status == 'Bad', ID, FUN = any), 'Bad', 'Good'))
    
    library(data.table)
    setDT(df)[, Status_overall := if(any(Status == 'Bad')) 'Bad' else 'Good', ID]