Seeking some expertise/guidance on creating a new column to indicate possible duplicates based on a few selected columns.
I have the following dataframe
ID | Animal | Age | Delivery | Cost | Country |
---|---|---|---|---|---|
1 | dog | 5 | Air | 120 | Nigeria |
2 | cat | 3 | Air | 110 | Kenya |
3 | fish | 1 | Air | 20 | Kenya |
4 | dog | 5 | Air | 150 | Nigeria |
5 | cat | 3 | Air | 100 | Kenya |
6 | dog | 6 | Air | 180 | Egypt |
7 | cat | 3 | Air | 135 | Kenya |
8 | turtle | 10 | Air | 90 | Nigeria |
df = structure(list(ID = 1:8, Animals = c("dog", "cat", "fish", "dog",
"cat", "dog", "cat", "turtle"), Age = c(5L, 3L, 1L, 5L, 3L, 6L,
3L, 10L), Delivery = c("Air", "Air", "Air", "Air", "Air", "Air",
"Air", "Air"), Cost = c(120L, 110L, 20L, 150L, 100L, 180L, 135L,
90L), Country = c("Nigeria", "Kenya", "Kenya", "Nigeria", "Kenya",
"Egypt", "Kenya", "Nigeria")), class = "data.frame", row.names = c(NA,
-8L))
I would like to create a new column that highlights if the 3 columns - Animal, Age & Country repeats more than once to tag as duplicates.
The output would be the following
ID | Animal | Age | Delivery | Cost | Country | New Column |
---|---|---|---|---|---|---|
1 | dog | 5 | Air | 120 | Nigeria | Y |
2 | cat | 3 | Air | 110 | Kenya | Y |
3 | fish | 1 | Air | 20 | Kenya | N |
4 | dog | 5 | Air | 150 | Nigeria | Y |
5 | cat | 3 | Air | 100 | Kenya | Y |
6 | dog | 6 | Air | 180 | Egypt | N |
7 | cat | 3 | Air | 135 | Kenya | Y |
8 | turtle | 10 | Air | 90 | Nigeria | N |
Thanks in advance!
You can group_by
your target columns, then use n()
to see if there's more than one record per group.
library(dplyr)
df %>% group_by(Animals, Age, Country) %>% mutate(dup = n() > 1) %>% ungroup()
# A tibble: 8 × 7
ID Animals Age Delivery Cost Country dup
<int> <chr> <int> <chr> <int> <chr> <lgl>
1 1 dog 5 Air 120 Nigeria TRUE
2 2 cat 3 Air 110 Kenya TRUE
3 3 fish 1 Air 20 Kenya FALSE
4 4 dog 5 Air 150 Nigeria TRUE
5 5 cat 3 Air 100 Kenya TRUE
6 6 dog 6 Air 180 Egypt FALSE
7 7 cat 3 Air 135 Kenya TRUE
8 8 turtle 10 Air 90 Nigeria FALSE