Search code examples
rselectnumericnormalize

R function that selects for numeric vectors and normalizes x to mean(x) = 0 and sd(x) = 1


In R I want to program a function normalize() that normalizes a numeric vector x to mean(x) = 0 and sd(x) = 1, and that provides flexibility in handling NAs using tidyverse functionality.

Using the starwars dataset as an example, I tried to write a function that drops all columns not consisting of numeric values:

normalize <- function(x){
  x_numeric <-select_if(x, is.numeric(unlist(x)))
   (x_numeric - mean(x_numeric, na.rm = TRUE) / sd(x_numeric, na.rm = TRUE))
}

print(normalize(starwars))

I am quite new to R and therefore get several error messages for example:

Error in select_if(x, is.numeric(unlist(x))) : ✖ .p should have the same size as the number of variables in the tibble.


Solution

  • We may use transmute with across

    library(dplyr)
    starwars %>% 
       transmute(across(where(is.numeric),
          ~ (.x - mean(.x, na.rm = TRUE))/sd(.x, na.rm = TRUE)))
    

    Or as a function

    normalize_dat <- function(data) {
          data %>%
            transmute(across(where(is.numeric),
          ~ (.x - mean(.x, na.rm = TRUE))/sd(.x, na.rm = TRUE)))
       }
    

    -testing

    > normalize_dat(starwars)
    # A tibble: 87 × 3
        height    mass birth_year
         <dbl>   <dbl>      <dbl>
     1 -0.0678 -0.120      -0.443
     2 -0.212  -0.132       0.158
     3 -2.25   -0.385      -0.353
     4  0.795   0.228      -0.295
     5 -0.701  -0.285      -0.443
     6  0.105   0.134      -0.230
     7 -0.269  -0.132      -0.262
     8 -2.22   -0.385      NA    
     9  0.249  -0.0786     -0.411
    10  0.220  -0.120      -0.198
    # … with 77 more rows
    

    Or use select and then scale

    starwars %>% 
        select(where(is.numeric)) %>% 
        scale