Search code examples
rgroup-byconcatenation

Concatenate strings based on value in another column


I am trying to concatenate strings for specific groups of customer ids/order dates. I have a dataframe:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, orderid, status)

I am trying to group by customer id and order date. Then for each group, I want all but 1 "review" changed to "duplicate" and all the orderids concatenated by customerid/orderdate. The result would be:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1,2", "1,2", "3,4", "3,4", "5,6,7", "5,6,7", "5,6,7", "8")
status <- c("review", "duplicate", "review", "negative", "positive", "review", "duplicate", "review")
df <- data.frame(customerid, orderdate, orderid, status)

Ty!


Solution

  • You can try

    library(dplyr)
    
    df %>%
      group_by(customerid, orderdate) %>%
      mutate(orderid = toString(orderid),
             status = ifelse(status == "review" & duplicated(status), "duplicate", status)) %>%
      ungroup()
    
    # A tibble: 8 × 4
      customerid orderdate  orderid status
      <chr>      <chr>      <chr>   <chr>    
    1 A1         2018-09-14 1, 2    review
    2 A1         2018-09-14 1, 2    duplicate
    3 A2         2018-09-15 3, 4    review
    4 A2         2018-09-15 3, 4    negative
    5 A3         2020-08-21 5, 6, 7 positive
    6 A3         2020-08-21 5, 6, 7 review
    7 A3         2020-08-21 5, 6, 7 duplicate
    8 A4         2018-08-10 8       review