In R I want to program a function normalize()
that normalizes a numeric vector x to mean(x) = 0 and sd(x) = 1, and that provides flexibility in handling NA
s using tidyverse functionality.
Using the starwars
dataset as an example, I tried to write a function that drops all columns not consisting of numeric values:
normalize <- function(x){
x_numeric <-select_if(x, is.numeric(unlist(x)))
(x_numeric - mean(x_numeric, na.rm = TRUE) / sd(x_numeric, na.rm = TRUE))
}
print(normalize(starwars))
I am quite new to R and therefore get several error messages for example:
Error in select_if(x, is.numeric(unlist(x))) :
✖ .p
should have the same size as the number of variables in the tibble.
We may use transmute
with across
library(dplyr)
starwars %>%
transmute(across(where(is.numeric),
~ (.x - mean(.x, na.rm = TRUE))/sd(.x, na.rm = TRUE)))
Or as a function
normalize_dat <- function(data) {
data %>%
transmute(across(where(is.numeric),
~ (.x - mean(.x, na.rm = TRUE))/sd(.x, na.rm = TRUE)))
}
-testing
> normalize_dat(starwars)
# A tibble: 87 × 3
height mass birth_year
<dbl> <dbl> <dbl>
1 -0.0678 -0.120 -0.443
2 -0.212 -0.132 0.158
3 -2.25 -0.385 -0.353
4 0.795 0.228 -0.295
5 -0.701 -0.285 -0.443
6 0.105 0.134 -0.230
7 -0.269 -0.132 -0.262
8 -2.22 -0.385 NA
9 0.249 -0.0786 -0.411
10 0.220 -0.120 -0.198
# … with 77 more rows
Or use select
and then scale
starwars %>%
select(where(is.numeric)) %>%
scale