How can I evaluate a condition on multiple rows in dplyr
?
I have a dataset which I want to mutate according to conditions happening (transitions) on multiple time periods.
Following the example below if one passes by a bad status it has to be treated as bad overall. I tried mutate_if
but it does not work, or probably I am not able to understand the syntax
df <-
data.frame(ID = c(1,1,1,2,2,2,3,3,3),
Date= c(1,2,3,1,2,3,1,2,3),
Money = c(500,400,500,100,100,100,200,300,300),
Status = c("Good", "Bad", "Good", "Good","Good","Good", "Bad","Good","Good"))
Could you provide me a solution to reach the below results? I would prefer staying inside the boundaries of dplyr
if possible, although I know some good treatment could be performed with datatable
result <-
data.frame(ID = c(1,1,1,2,2,2,3,3,3),
Date= c(1,2,3,1,2,3,1,2,3),
Money = c(500,400,500,100,100,100,200,300,300),
Status = c("Good", "Bad", "Good", "Good","Good","Good", "Bad","Good","Good"),
Status_overall = c("Bad", "Bad", "Bad", "Good","Good","Good", "Bad","Bad","Bad"))
You can return 'Bad'
if any
Status
is 'Bad'
in a ID
.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Status_overall = if(any(Status == 'Bad')) 'Bad' else 'Good')
#Without if/else
#mutate(Status_overall = c('Good', 'Bad')[any(Status == 'Bad') + 1])
# ID Date Money Status Status_overall
# <dbl> <dbl> <dbl> <chr> <chr>
#1 1 1 500 Good Bad
#2 1 2 400 Bad Bad
#3 1 3 500 Good Bad
#4 2 1 100 Good Good
#5 2 2 100 Good Good
#6 2 3 100 Good Good
#7 3 1 200 Bad Bad
#8 3 2 300 Good Bad
#9 3 3 300 Good Bad
This can be written in base R and data.table
as :
df$Status_overall <- with(df, ifelse(ave(Status == 'Bad', ID, FUN = any), 'Bad', 'Good'))
library(data.table)
setDT(df)[, Status_overall := if(any(Status == 'Bad')) 'Bad' else 'Good', ID]