Search code examples
rdplyrstringr

Concatenate while dropping specific value strings


I have a dataframe in R called df:

A = c("ok","WA","WA","ok","WA")
B = c("WB","ok","ok","ok","WB")
C = c("WC","ok","WC","ok","WC")
df = tibble(A,B,C)
df
  A     B     C    
  <chr> <chr> <chr>
1 ok    WB    WC   
2 WA    ok    ok   
3 WA    ok    WC   
4 ok    ok    ok   
5 WA    WB    WC  

I want to create (mutate) a new column that will concatenate all the values that are not OK like this:

  A     B     C     D 
  <chr> <chr> <chr> <chr>
1 ok    WB    WC     WB,WC
2 WA    ok    ok     WA
3 WA    ok    WC     WA,WC
4 ok    ok    ok     NO W
5 WA    WB    WC     WA,WB,WC

Solution

  • You could use a paste approach with the help of gsub:

    A <- c("ok","WA","WA","ok","WA")
    B <- c("WB","ok","ok","ok","WB")
    C <- c("WC","ok","WC","ok","WC")
    df <- data.frame(A=A, B=B, C=C, stringsAsFactors=FALSE)
    
    df$D <- paste(df$A, df$B, df$C, sep=",")
    df$D <- gsub("^,|,$", "", gsub(",{2,}", ",", gsub("\\bok\\b", ",", df$D)))
    df
    
       A  B  C        D
    1 ok WB WC    WB,WC
    2 WA ok ok       WA
    3 WA ok WC    WA,WC
    4 ok ok ok         
    5 WA WB WC WA,WB,WC
    

    The basic strategy here is to strip out ok entries, then cleanup the possible dangling commas left from the paste call.