Search code examples
rdataframedplyrstandardized

scale columns based on vector of column names


set.seed(123)
  
dat <-   
  data.frame(year_ref = 2000:2004,
             www_val1 = sample(5),
             www_val2 = sample(5),
             www_val3 = sample(5),
             sat_val1 = sample(5),
             sat_val2 = sample(5),
             sat_val3 = sample(5),
             ds_val1 = sample(5),
             ds_val2 = sample(5),
             ds_val3 = sample(5))

I want to scale all columns whose names are provided in another vector. For eg. vector var_names has ds and sat, I want to scale all columns whose name starts with them

var_names <- c("ds", "sat")
  
library(dplyr)
  
dat %>% 
 dplyr::select(contains(var_names)) %>%
 dplyr::mutate(scale(., center = T, scale = T))

However, this is creating new columns. Can I implement a solution like below so that I can make changes in the original dataframe only except that I do not want to hardcode column index

dat[, 5:10] <- apply(dat[, 5:10], 2, function(x) scale(x, center = T, scale = T))
        

Solution

  • library(tidyverse)
    set.seed(123)
    dat <-   
      data.frame(year_ref = 2000:2004,
                 www_val1 = sample(5),
                 www_val2 = sample(5),
                 www_val3 = sample(5),
                 sat_val1 = sample(5),
                 sat_val2 = sample(5),
                 sat_val3 = sample(5),
                 ds_val1 = sample(5),
                 ds_val2 = sample(5),
                 ds_val3 = sample(5))
    var_names <- c("ds", "sat")
    dat %>% 
      dplyr::mutate_at(vars(starts_with(var_names)), ~scale(., center = T, scale = T))
    #   year_ref www_val1 www_val2 www_val3   sat_val1   sat_val2   sat_val3    ds_val1    ds_val2    ds_val3
    # 1     2000        3        3        1  0.0000000 -0.6324555 -1.2649111  0.6324555  0.6324555  0.0000000
    # 2     2001        5        5        3 -1.2649111  0.0000000  0.0000000 -0.6324555 -1.2649111  1.2649111
    # 3     2002        2        2        2  0.6324555  0.6324555  0.6324555  0.0000000  1.2649111 -0.6324555
    # 4     2003        4        4        5 -0.6324555 -1.2649111 -0.6324555  1.2649111 -0.6324555 -1.2649111
    # 5     2004        1        1        4  1.2649111  1.2649111  1.2649111 -1.2649111  0.0000000  0.6324555