Search code examples
rdataframedplyrscaling

mutate_at in R with lambda function?


I have a dataframe with 100 columns. Each column represent a probability value.

I want to do scaling there and I am using the following transformation:

df <- df %>%
      mutate_at(vars(specific_columns), 
                funs(function(x) {((x - min(x)) / (max(x) - min(x)))}))

But it doesn't work and doesn't produce the output I want.

For example the sample data is:

col1        col2        col3        col4        col5        
0.014492754 0.014492754 0.014492754 0.014492754 0.014492754 
0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
0.028985507 0.028985507 0.028985507 0.028985507 0.028985507 
0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
0.014492754 0.014492754 0.014492754 0.014492754 0.014492754 
0.014492754 0.014492754 0.014492754 0.014492754 0.014492754 
0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
0.010989011 0.010989011 0.010989011 0.010989011 0.010989011 
0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 

Error:

Error in mutate_impl(.data, dots) : Column col1 is of unsupported type function


Solution

  • Try this syntax instead:

    library(dplyr)
    df %>% mutate_at(vars(everything()), funs(((. - min(.)) / (max(.) - min(.)))))
    #>         col1      col2      col3      col4      col5
    #> 1  0.5000000 0.5000000 0.5000000 0.5000000 0.5000000
    #> 2  0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> 3  0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> 4  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
    #> 5  0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> 6  0.5000000 0.5000000 0.5000000 0.5000000 0.5000000
    #> 7  0.5000000 0.5000000 0.5000000 0.5000000 0.5000000
    #> 8  0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> 9  0.3791209 0.3791209 0.3791209 0.3791209 0.3791209
    #> 10 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    

    funs() will interpret a pseudo-function for you. It helps deal with two cases which would not otherwise work:

    1. Character name of a function (eg. "mean")
    2. A call to a function with . as a dummy argument (as in my example)

    If you have already declared your own (anonymous) function, there is no need to use funs() since mutate_at() will accept this as-is:

    mutate_at(df, vars(everything()), function(x) {((x - min(x)) / (max(x) - min(x)))})
    

    or

    my_func <- function(x) {((x - min(x)) / (max(x) - min(x)))}
    mutate_at(df, vars(everything()), my_func)