Search code examples
rdplyrgroup-byany

Match values in two columns within group in R


I have a big data frame in which I would like to flag the rows as follows:

within each group, I want to find whether any col1 value is present in col2 column. If so, mark those lines where same values appear.

this is my dataset with the desired flag values:

this is my dataset with the desired flag values

my try was the following:

df %>%
 group_by(attr) %>%
 mutate(flag=ifelse(any(col1)==any(col2), 1, 0))

but it does not work.

thanks in advance!


Solution

  • Please note: I am not sure if you made a typo in your dataframe for row ABC-LKJ 210-31.

    You can use the or | so that the 1 returns for both of the rows where the values match per group_by. You can use the following code:

    df <- data.frame(col1 = c("ABC", "FGC", "ZHU", "IIN", "OIL", "ABC", "LKJ", "SNM", "QQA"),
                     col2 = c("DLK", "CBN", "ABC", "ZHU", "HSC", "LJK", "QQA", "KDL", "CBV"),
                     attr = c("100-30", "100-30", "100-30", "21-667", "21-667", "210-31", "201-31", "201-31", "201-31"))
    
    library(dplyr)
    df %>%
      group_by(attr) %>%
      mutate(flag = ifelse(col1 %in% col2 | col2 %in% col1, 1, 0))
    #> # A tibble: 9 × 4
    #> # Groups:   attr [4]
    #>   col1  col2  attr    flag
    #>   <chr> <chr> <chr>  <dbl>
    #> 1 ABC   DLK   100-30     1
    #> 2 FGC   CBN   100-30     0
    #> 3 ZHU   ABC   100-30     1
    #> 4 IIN   ZHU   21-667     0
    #> 5 OIL   HSC   21-667     0
    #> 6 ABC   LJK   210-31     0
    #> 7 LKJ   QQA   201-31     1
    #> 8 SNM   KDL   201-31     0
    #> 9 QQA   CBV   201-31     1
    

    Created on 2022-07-08 by the reprex package (v2.0.1)