Search code examples
rcategorization

R: categorize all numeric variables (1:0) according to a cut-off


I have the following data frame:

structure(list(test1 = c(0.12, 0.2, 0.55, 0.22, 0.19, 0.17, 0.15, 
0.12, 0.32, 0.23, 0.32, 0.23), test2 = c(0.15, 0.12, 0.32, 0.23, 
0.12, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22), test3 = c(0.07, 
0.01, 0, 0.13, 0.16, 0.78, 0.98, 0.1, 0.5, 0.3, 0.4, 0.5), test4 = c(0.23, 
0.12, 0.2, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22, 0.55, 0.42
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))

And I am trying to write a script which, for each variables (test1, test2, test3...), creates (and add to the data frame) a dicotomic variable (named as out_testX) depending if the variable value is major or equal to .20.

The results, should be something like this:

structure(list(test1 = c(0.12, 0.2, 0.55, 0.22, 0.19, 0.17, 0.15, 
0.12, 0.32, 0.23, 0.32, 0.23), test2 = c(0.15, 0.12, 0.32, 0.23, 
0.12, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22), test3 = c(0.07, 
0.01, 0, 0.13, 0.16, 0.78, 0.98, 0.1, 0.5, 0.3, 0.4, 0.5), test4 = c(0.23, 
0.12, 0.2, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22, 0.55, 0.42
), out_test1 = c(0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1), out_test2 = c(0, 
0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1), out_test3 = c(0, 0, 0, 0, 0, 
1, 1, 0, 1, 1, 1, 1), out_test4 = c(1, 0, 1, 1, 1, 1, 0, 1, 1, 
1, 1, 1)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", 
"data.frame"))

Can anyone help me? Thank you


Solution

  • With mutate_all, we pass the function in the list specify a suffix to be added to column name, and if it needs to be prefix, do this in rename_at

    library(dplyr)
    library(stringr)
    df1 %>% 
         mutate_all( list(out = ~+( . >= .2))) %>%
         rename_at(vars(ends_with('out')), ~ str_replace(., '(.*)_(out)', '\\2_\\1'))
    

    Or using base R

    df1[paste0("out_", names(df1))] <- +(df1 >= .2)