Search code examples
rdata-cleaning

Replacing values with 'NA' by ID in R


I have data that looks like this

ID    v1    v2
1     1     0
2     0     1
3     1     0
3     0     1
4     0     1

I want to replace all values with 'NA' if the ID occurs more than once in the dataframe. The final product should look like this

ID    v1    v2
1     1     0
2     0     1
3     NA    NA
3     NA    NA
4     0     1

I could do this by hand, but I want R to detect all the duplicate cases (in this case two times ID '3') and replace the values with 'NA'.

Thanks for your help!


Solution

  • You could use duplicated() from either end, and then replace.

    idx <- duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE)
    df[idx, -1] <- NA
    

    which gives

      ID v1 v2
    1  1  1  0
    2  2  0  1
    3  3 NA NA
    4  3 NA NA
    5  4  0  1
    

    This will also work if the duplicated IDs are not next to each other.

    Data:

    df <- structure(list(ID = c(1L, 2L, 3L, 3L, 4L), v1 = c(1L, 0L, 1L, 
    0L, 0L), v2 = c(0L, 1L, 0L, 1L, 1L)), .Names = c("ID", "v1", 
    "v2"), class = "data.frame", row.names = c(NA, -5L))