Search code examples
rduplicatesidentify

How to find the common elements among different sized columns in R?


I have a data frame called animals containing different sized columns that have some common and uncommon elements among each other as shown below:

Dog     Cat      Lion     Dog
Cat     Lion     Dog      Shark
Lion    Dog      Shark    Cat
Shark   Shark    Cat      Lion
        Whale    Seal     Moose
        Seal              Whale
                          Deer

What I want to do is identify all the common elements within each column, exclude the uncommon elements and combine the common elements into one column like this:

Dog
Cat
Lion
Shark

So far I've tried identifying the duplicated elements using duplicated(animals) and then extract the duplicated elements using animals[duplicated(animals)] but this gives no results. Does anyone have a better method?


Solution

  • We can use intersect

    Reduce(intersect, animals)
    #[1] "Dog"   "Cat"   "Lion"  "Shark"
    

    Or can use tidyverse

    library(dplyr)
    library(tidyr)
    pivot_longer(animals, cols = everything(), values_drop_na = TRUE) %>% 
         group_by(value) %>% 
         filter(n_distinct(name) == ncol(animals)) %>% 
         ungroup %>% 
         distinct(value)
    # A tibble: 4 x 1
    #  value
    #  <chr>
    #1 Dog  
    #2 Cat  
    #3 Lion 
    #4 Shark
    

    data

    animals <- structure(list(v1 = c("Dog", "Cat", "Lion", "Shark", NA, NA, 
    NA), v2 = c("Cat", "Lion", "Dog", "Shark", "Whale", "Seal", NA
    ), v3 = c("Lion", "Dog", "Shark", "Cat", "Seal", NA, NA), v4 = c("Dog", 
    "Shark", "Cat", "Lion", "Moose", "Whale", "Deer")), 
        class = "data.frame", row.names = c(NA, 
    -7L))