How can I remove a row which is a duplicate for certain columns, and keep the one row with a specific value/character of a non-duplicate column?
In other words: How can I select which row to keep and the other remove with some duplicate columns
This is for a R data.frame.
I already tried:
Data.frame is X
deduped.data <- unique( X[ , 1:5 ] )
problem: I cannot keep the column with non-duplicate row
X <- X %>% distinct()
problem: I cannot tell which row can be removed with duplicates in some columns
In the list of former questions I do not find answers: the information which row to keep/remove is available
deduped.data <- unique( X[ , 1:5 ] )
X <- X %>% distinct()
An example
Data.frame X:
Row 1: Gender: Male, Age: 20, Country: Italy
Row 2: Gender: Male, Age: 20, Country: France
Row 3 etc
I want to remove the duplicates of columns 1 and 2 and keep Country Italy. I expect that row 2 is being removed. unique()
nor distinct()
can do this as far as I can see.
This removes duplicates, you can specify the column in the duplicated argument. If you want to keep the non duplicates you can remove the !
X <- X[, !duplicated(colnames(X))]