Search code examples
rdelete-rowduplicate-data

delete the rows with duplicated ids


I want to delete the rows with duplicated ids

data

id    V1    V2   
1     a      1
1     b      2
2     a      2
2     c      3
3     a      4

The problem is that some people did the test for a few times, which generate multiple scores on V2, I want to delete the duplicated id and retain one of the scores in V2 randomly.

output

id    V1    V2   
1     a      1
2     a      2
3     a      4

I tried this:

neu <- unique(neu$userid)

but it didn't work


Solution

  • Using dplyr:

    library(dplyr)
    set.seed(1)
    df %>% sample_frac(., 1) %>% arrange(id) %>% distinct(id) 
    

    Output:

      id V1 V2
    1  1  b  2
    2  2  c  3
    3  3  a  4
    

    Data:

    df <- structure(list(id = c(1L, 1L, 2L, 2L, 3L), V1 = structure(c(1L, 
    2L, 1L, 3L, 1L), .Label = c("a", "b", "c"), class = "factor"), 
        V2 = c(1L, 2L, 2L, 3L, 4L)), .Names = c("id", "V1", "V2"), class = "data.frame", row.names = c(NA, 
    -5L))