Search code examples
rgrouping

R - how to replace strings in a column based on 2 or more values


I am trying to replace a string for specific groups of customer ids/order dates. The example below may better illustrate my question. I have a dataframe:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, status)

I am trying to group by customerid and orderdate. Then in each group of customerid/orderddate, if there is a "positive" or "negative" in the status column, I want to replace "review" with "done". The replacement can be done in a new column or the same one. The result would be:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
new <- c("review", "review", "done", "negative", "positive", "done", "done", "review")
df <- data.frame(customerid, orderdate, status, new)

I tried using the dplyr package and group_by and filter functions but am not sure how to proceed from here.

df <- df %>% 
       group_by(customerid, orderdate) %>% 
       dplyr::filter(df$new == 'review' & df$new == 'positive') %>%   

Thank you!


Solution

  • If we want to replace the review to done only when there is a positve or negative value in status for that group, then after grouping, create a logical vector that checks whether there are any 'positive' or 'negative' %in% the 'status' and (&) the value is 'review', then replace with 'done' or else return the column values

    library(dplyr)
    df <- df %>% 
          group_by(customerid, orderdate) %>%
          mutate(new = case_when(any(c('positive', 'negative') %in% 
                 status) & status == 'review' ~ 'done', TRUE ~ status)) %>% 
         ungroup
    

    -output

    df
    # A tibble: 8 × 4
      customerid orderdate  status   new     
      <chr>      <chr>      <chr>    <chr>   
    1 A1         2018-09-14 review   review  
    2 A1         2018-09-14 review   review  
    3 A2         2018-09-15 review   done    
    4 A2         2018-09-15 negative negative
    5 A3         2020-08-21 positive positive
    6 A3         2020-08-21 review   done    
    7 A3         2020-08-21 review   done    
    8 A4         2018-08-10 review   review