Search code examples
rduplicatesrowconditional-statements

How can I remove a row which is a duplicate for certain columns, and keep the one row with a specific value/character of a non-duplicate column


How can I remove a row which is a duplicate for certain columns, and keep the one row with a specific value/character of a non-duplicate column?

In other words: How can I select which row to keep and the other remove with some duplicate columns

This is for a R data.frame.

I already tried:

Data.frame is X

 deduped.data <- unique( X[ , 1:5 ] )

problem: I cannot keep the column with non-duplicate row

X <- X %>% distinct()

problem: I cannot tell which row can be removed with duplicates in some columns

In the list of former questions I do not find answers: the information which row to keep/remove is available

deduped.data <- unique( X[ , 1:5 ] )
X <- X %>% distinct()

An example

Data.frame X:

Row 1: Gender: Male, Age: 20, Country: Italy

Row 2: Gender: Male, Age: 20, Country: France

Row 3 etc

I want to remove the duplicates of columns 1 and 2 and keep Country Italy. I expect that row 2 is being removed. unique() nor distinct() can do this as far as I can see.


Solution

  • This removes duplicates, you can specify the column in the duplicated argument. If you want to keep the non duplicates you can remove the !

    X <- X[, !duplicated(colnames(X))]