Search code examples
rdataframerecode

R. Recode columns based on values of other 2 columns that have different levels


I have a data frame with 10000 rows and 1000 columns that looks like this:

ID    a0    a1    V1    V2    V3
rs1   G     A     0     0     1
rs2   C     T     1     0     0
rs3   T     C     0     1     1    

a0 and a1 can be A, T, C or G, and indicate whether the other columns are 0 or 1. For instance, in the second row, a0 = G and a1 = A, so V1 = 0 (G), V2 = 0 (G) and V3 = 1 (A). I expect an output data frame like this:

ID    a0    a1    V1    V2    V3
rs1   G     A     G     G     A
rs2   C     T     T     C     C
rs3   T     C     T     C     C

Many thanks


Solution

  • We can use lapply and ifelse to perform the replacement.

    dat[, -(1:3)] <- lapply(dat[, -(1:3)], function(x){
      x <- ifelse(x == 0, dat[, 2], dat[, 3])
      return(x)
    })
    
    dat
    #    ID a0 a1 V1 V2 V3
    # 1 rs1  G  A  G  G  A
    # 2 rs2  C  T  T  C  C
    # 3 rs3  T  C  T  C  C
    

    DATA

    dat <- read.table(text = "ID    a0    a1    V1    V2    V3
    rs1   G     A     0     0     1
    rs2   C     T     1     0     0
    rs3   T     C     0     1     1",
                      header = TRUE, stringsAsFactors = FALSE)