Search code examples
rdummy-variable

R - Function to make a binary variable


I have some variables which take value between 1 and 5. I would like to code them 0 if they take the value between 1 and 3 (included) and 1 if they take the value 4 or 5.

My dataset looks like this

var1    var2        var3
1       1            NA
4       3            4
3       4            5
2       5            3

So I would like it to be like this:

var1    var2        var3
0       0            NA
1       0            1
0       1            1
0       1            0

I tried to do a function and to call it

making_binary <- function (var){
  var <- factor(var >= 4, labels = c(0, 1))
  return(var)
}


df <- lapply(df, making_binary)

But I had an error : incorrect labels : length 2 must be 1 or 1

Where did I go wrong? Thank you very much for your answers!


Solution

  • You can use :

    df[] <- +(df == 4 | df == 5)
    df
    #  var1 var2 var3
    #1    0    0   NA
    #2    1    0    1
    #3    0    1    1
    #4    0    1    0
    

    Comparison of df == 4 | df == 5 returns logical values (TRUE/FALSE), + here turns those logical values to integer values (1/0) respectively.

    If you want to apply this for selected columns you can subset the columns by position or by name.

    cols <- 1:3 #Position
    #cols <- grep('var', names(df)) #Name
    df[cols] <- +(df[cols] == 4 | df[cols] == 5)
    

    As far as your function is concerned you can do :

    making_binary <- function (var){
      var <- as.integer(var >= 4)
      #which is faster version of
      #var <- ifelse(var >= 4, 1, 0)
      return(var)
    }
    
    df[] <- lapply(df, making_binary)
    

    data

    df <- structure(list(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L, 
    5L), var3 = c(NA, 4L, 5L, 3L)), class = "data.frame", row.names = c(NA, -4L))