Search code examples
rdataframedplyrtidyversedata-manipulation

New column to identify duplicate rows in R


Seeking some expertise/guidance on creating a new column to indicate possible duplicates based on a few selected columns.

I have the following dataframe

ID Animal Age Delivery Cost Country
1 dog 5 Air 120 Nigeria
2 cat 3 Air 110 Kenya
3 fish 1 Air 20 Kenya
4 dog 5 Air 150 Nigeria
5 cat 3 Air 100 Kenya
6 dog 6 Air 180 Egypt
7 cat 3 Air 135 Kenya
8 turtle 10 Air 90 Nigeria
df = structure(list(ID = 1:8, Animals = c("dog", "cat", "fish", "dog", 
                                          "cat", "dog", "cat", "turtle"), Age = c(5L, 3L, 1L, 5L, 3L, 6L, 
                                           3L, 10L), Delivery = c("Air", "Air", "Air", "Air", "Air", "Air", 
                                           "Air", "Air"), Cost = c(120L, 110L, 20L, 150L, 100L, 180L, 135L, 
                                           90L), Country = c("Nigeria", "Kenya", "Kenya", "Nigeria", "Kenya", 
                                           "Egypt", "Kenya", "Nigeria")), class = "data.frame", row.names = c(NA, 
                                           -8L))

I would like to create a new column that highlights if the 3 columns - Animal, Age & Country repeats more than once to tag as duplicates.

The output would be the following

ID Animal Age Delivery Cost Country New Column
1 dog 5 Air 120 Nigeria Y
2 cat 3 Air 110 Kenya Y
3 fish 1 Air 20 Kenya N
4 dog 5 Air 150 Nigeria Y
5 cat 3 Air 100 Kenya Y
6 dog 6 Air 180 Egypt N
7 cat 3 Air 135 Kenya Y
8 turtle 10 Air 90 Nigeria N

Thanks in advance!


Solution

  • You can group_by your target columns, then use n() to see if there's more than one record per group.

    library(dplyr)
    
    df %>% group_by(Animals, Age, Country) %>% mutate(dup = n() > 1) %>% ungroup()
    
    # A tibble: 8 × 7
         ID Animals   Age Delivery  Cost Country dup  
      <int> <chr>   <int> <chr>    <int> <chr>   <lgl>
    1     1 dog         5 Air        120 Nigeria TRUE 
    2     2 cat         3 Air        110 Kenya   TRUE 
    3     3 fish        1 Air         20 Kenya   FALSE
    4     4 dog         5 Air        150 Nigeria TRUE 
    5     5 cat         3 Air        100 Kenya   TRUE 
    6     6 dog         6 Air        180 Egypt   FALSE
    7     7 cat         3 Air        135 Kenya   TRUE 
    8     8 turtle     10 Air         90 Nigeria FALSE