Search code examples
rdataframerecode

How to recode many data frame columns with same function


I have a data frame like this:

CriterionVar Var1 Var2 Var3
3            0    0    0
1            0    0    0
2            0    0    0
5            0    0    0 

I want to recode the values of Var1, Var2, and Var3 based on the value of CriterionVar. In pseudocode, it would be something like this:

for each row
   if (CriterionVar.value >= Var1.index) Var1 = 1
   if (CriterionVar.value >= Var2.index) Var2 = 1
   if (CriterionVar.value >= Var3.index) Var3 = 1

The recoded data frame would look like this:

CriterionVar Var1 Var2 Var3
3            1    1    1
1            1    0    0
2            1    1    0
5            1    1    1

Obviously, that is not the way to get it done because (1) the number of VarN columns is determined by a data value, and (2) it's just ugly.

Any help is appreciated.


Solution

  • For more general values of CriterionVar, you can use outer to construct a logical matrix which you can use for indexing like this:

    dat[2:4][outer(dat$CriterionVar, seq_along(names(dat)[-1]), ">=")] <- 1
    

    In this example, this returns

    dat
      CriterionVar Var1 Var2 Var3
    1            3    1    1    1
    2            1    1    0    0
    3            2    1    1    0
    4            5    1    1    1
    

    A second method using col, which returns a matrix of the column index, is a tad bit more direct:

    dat[2:4][dat$CriterionVar >= col(dat[-1])] <- 1
    

    and returns the desired result.


    data

    dat <-
    structure(list(CriterionVar = c(3L, 1L, 2L, 5L), Var1 = c(0L, 
    0L, 0L, 0L), Var2 = c(0L, 0L, 0L, 0L), Var3 = c(0L, 0L, 0L, 0L
    )), .Names = c("CriterionVar", "Var1", "Var2", "Var3"), class = "data.frame",
    row.names = c(NA, -4L))