Search code examples
rdelete-row

Delete row with more than one occurence of a row elements


I'm kind of stuck here. I want to delete all rows except one (based on the element in a column, just keep one instead of all).

My data looks like this:

row     Name     Nr     V   Gmd     Kt
1   Aadorf     8355     0   Aadorf  TG
2   Aarau      5004     0   Aarau   AG
3   Aarau      5000     0   Aarau   AG
4   Aarau      5032     0   Aarau   AG
5   Aetigkofen 4583     2   Buchegg SO
6   Aetingen   4587     0   Buchegg SO
...

I only want to keep the first "Aarau" and the first "Buchegg" etc. It should look like that:

row     Name     Nr     V   Gmd     Kt
1   Aadorf     8355     0   Aadorf  TG
4   Aarau      5032     0   Aarau   AG
6   Aetingen   4587     0   Buchegg SO
...

Thank you very much for your help!


Solution

  • You can just use duplicated:

    ## duplicated creates a logical vector
    duplicated(mydf$Gmd, fromLast=TRUE) 
    # [1] FALSE  TRUE  TRUE FALSE  TRUE FALSE
    
    ## You can use that vector to subset the rows you want
    mydf[!duplicated(mydf$Gmd, fromLast=TRUE), ]
    #   row     Name   Nr V     Gmd Kt
    # 1   1   Aadorf 8355 0  Aadorf TG
    # 4   4    Aarau 5032 0   Aarau AG
    # 6   6 Aetingen 4587 0 Buchegg SO
    

    Change the fromLast argument to what you're actually looking for--your description doesn't seem to match your desired output.