I have some variables which take value between 1 and 5. I would like to code them 0 if they take the value between 1 and 3 (included) and 1 if they take the value 4 or 5.
My dataset looks like this
var1 var2 var3
1 1 NA
4 3 4
3 4 5
2 5 3
So I would like it to be like this:
var1 var2 var3
0 0 NA
1 0 1
0 1 1
0 1 0
I tried to do a function and to call it
making_binary <- function (var){
var <- factor(var >= 4, labels = c(0, 1))
return(var)
}
df <- lapply(df, making_binary)
But I had an error : incorrect labels : length 2 must be 1 or 1
Where did I go wrong? Thank you very much for your answers!
You can use :
df[] <- +(df == 4 | df == 5)
df
# var1 var2 var3
#1 0 0 NA
#2 1 0 1
#3 0 1 1
#4 0 1 0
Comparison of df == 4 | df == 5
returns logical values (TRUE
/FALSE
), +
here turns those logical values to integer values (1
/0
) respectively.
If you want to apply this for selected columns you can subset the columns by position or by name.
cols <- 1:3 #Position
#cols <- grep('var', names(df)) #Name
df[cols] <- +(df[cols] == 4 | df[cols] == 5)
As far as your function is concerned you can do :
making_binary <- function (var){
var <- as.integer(var >= 4)
#which is faster version of
#var <- ifelse(var >= 4, 1, 0)
return(var)
}
df[] <- lapply(df, making_binary)
data
df <- structure(list(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L,
5L), var3 = c(NA, 4L, 5L, 3L)), class = "data.frame", row.names = c(NA, -4L))