Search code examples
rduplicatesrowscolumnheader

Remove duplicate rows which have values as that of column header


My data somewhat looks like this:

    +--------+--------+--------+
| region |  name  | salary |
+--------+--------+--------+
| west   | raj    | 100    |
| north  | simran | 150    |
| region | name   | salary |
| east   | prem   | 250    |
| region | name   | salary |
| south  | preeti | 200    |
+--------+--------+--------+

The names of my column headers are being repeated in row no 3 and 5. How can I delete row no 3 and 5 using R and retain the column header as it is so that my output looks like this:

+--------+--------+--------+
| region |  name  | salary |
+--------+--------+--------+
| west   | raj    |    100 |
| north  | simran |    150 |
| east   | prem   |    250 |
| south  | preeti |    200 |
+--------+--------+--------+

Assuming that my original data has too many rows, I do not want to simply select row numbers and delete them using the command Data[-c(3, 5), ]


Solution

  • Here is a simple solution

    x <- data.frame(x =c("a", "b", "c", "x"), z = c("a", "b", "c", "z"))
    ## identify rows which match colnames 
    matched <- apply(x,1, function(i) i[1] %in% colnames(x) && i[2] %in% colnames(x))
    
    ## Take the inverse of the match
    x[!matched,]