Search code examples
rduplicatesdata-manipulation

remove duplicate rows but one, and create a df with unique and duplicate rows with a column with count of duplicates in R


In R I have a dataframe, and I would like to have a new dataframe where a new last column has the count of how many identical rows there are of each row in the original dataframe. But with only one row for each. Perhaps I'm unable to briefly explain what I intend to do, but the dummy example should help explain it better. Please look at it.

Here is what I would like to do in R: I have a dataframe, say:

dat <- data.frame(x1 = c(1, 1, 2, 1, 4),
                  x2 = c(1, 1, 2, 1, 6),
                  x3 = c(2, 2, 3, 2, 6),
                  x4 = c(1, 1, 2, 2, 4),
                  x5 = c(1, 1, 4, 4, 3))
print(dat, row.names = FALSE)
 x1 x2 x3 x4 x5
  1  1  2  1  1
  1  1  2  1  1
  2  2  3  2  4
  1  1  2  2  4
  4  6  6  4  3

What I would like to achieve is a new data frame with:

x1 x2 x3 x4 x5 count
1  1  2  1  1   2
2  2  3  2  4   1
1  1  2  2  4   1
4  6  6  4  3   1

I tried to search the web and SO but was unable to find a solution. Can you please help ? Thank you in advance

ltdm


Solution

  • library(dplyr)
    count(dat, across(x1:x5))
    
      x1 x2 x3 x4 x5 n
    1  1  1  2  1  1 2
    2  1  1  2  2  4 1
    3  2  2  3  2  4 1
    4  4  6  6  4  3 1