Search code examples
rduplicatesrow

Unique case of finding duplicate values flexibly across columns in R


I have a dataset similar to the following:

df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
                 predation_type = c("eats", "eats", "eaten by", "eats"),
                 animal_2 = c("mouse", "squirrel", "cat", "nuts"))

> df
  animal_1 predation_type animal_2
1      cat           eats    mouse
2      dog           eats squirrel
3    mouse       eaten by      cat
4 squirrel           eats     nuts

I am looking for code that identifies row 1 and row 3 as duplicates since they are showing the same phenomenon (a cat eating a mouse or a mouse being eaten by a cat). I'm not sure how to even ask what kind of duplicate case I'm looking for so I'm hoping someone can help. I've tried combining the text into one column (i.e., "catmouse", "dogsquirrel", etc.) and then inverting the letters but that quickly proved too complex.

Thanks so much for any help you can provide.


Solution

  • tidyverse

    df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
                     predation_type = c("eats", "eats", "eaten by", "eats"),
                     animal_2 = c("mouse", "squirrel", "cat", "nuts"))
    library(tidyverse)
    
    df %>% 
      rowwise() %>% 
      mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>% 
      group_by(duplicates) %>% 
      mutate(duplicates = n() > 1) %>% 
      ungroup()
    #> # A tibble: 4 x 4
    #>   animal_1 predation_type animal_2 duplicates
    #>   <chr>    <chr>          <chr>    <lgl>     
    #> 1 cat      eats           mouse    TRUE      
    #> 2 dog      eats           squirrel FALSE     
    #> 3 mouse    eaten by       cat      TRUE      
    #> 4 squirrel eats           nuts     FALSE
    

    Created on 2022-01-17 by the reprex package (v2.0.1)

    removing duplicates

    
    library(tidyverse)
    df %>% 
      filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
    #>   animal_1 predation_type animal_2
    #> 1      cat           eats    mouse
    #> 2      dog           eats squirrel
    #> 3 squirrel           eats     nuts
    

    Created on 2022-01-17 by the reprex package (v2.0.1)