Search code examples
rscaledplyrdata-wrangling

Standardizing columns by groups using r


I want to standardize some of my columns in my DF by year. I think I am grouping and selecting my variables correctly (in this example the variables that end in either ABC or XYZ), but am getting tied up using the mutate function. Here is my DF:

 DF:
 VARTY   VARDCE  Year    Var1ABC   VAr2XYZ  VAR3ABC  VAR4XYZ
  38      67     2015      78        34       68       98
  42      75     2013      85        56       87       75
  56      87     2017      67        72       98       76
  84      82     2018      78        34       91       89
  etc-----

I would like to have:

DF:
   VARTY   VARDCE  Year    Var1ABC   VAr2XYZ  VAR3ABC  VAR4XYZ Var1ABCZScore  VAr2XYZZScore...  
    38      67     2015      78        34       68       98       1.2            0.7
    42      75     2013      85        56       87       75       0.4            1.3
    56      87     2017      67        72       98       76       -0.2           0.8
    84      82     2018      78        34       91       89        0.6           0.1
    etc-----

This is the code I am using: 
DF.New<-DF%>%
  group_by(Year)%>%
  select(contains(c("ABC", "XYZ")))%>%
  mutate(funs(scale))

Any help is appreciated!


Solution

  • Could not use the example that you gave: So will use the iris dataset:

    library(tidyverse)
    iris %>%
      group_by(Species) %>%
      mutate(across(contains(c("Width","Length")),scale))
    # A tibble: 150 x 5
    # Groups:   Species [3]
       Sepal.Length[,1] Sepal.Width[,1] Petal.Length[,1] Petal.Width[,1] Species
                  <dbl>           <dbl>            <dbl>           <dbl> <fct>  
     1           0.267           0.190            -0.357          -0.436 setosa 
     2          -0.301          -1.13             -0.357          -0.436 setosa 
     3          -0.868          -0.601            -0.933          -0.436 setosa 
     4          -1.15           -0.865             0.219          -0.436 setosa 
     5          -0.0170          0.454            -0.357          -0.436 setosa 
     6           1.12            1.25              1.37            1.46  setosa 
     7          -1.15           -0.0739           -0.357           0.512 setosa 
     8          -0.0170         -0.0739            0.219          -0.436 setosa 
     9          -1.72           -1.39             -0.357          -0.436 setosa 
    10          -0.301          -0.865             0.219          -1.39  setosa 
    # ... with 140 more rows
    

    If you are still using the old format, you could do:

    iris %>%
       group_by(Species) %>%
       mutate_at(vars(contains(c("Width","Length"))),scale)
    # A tibble: 150 x 5
    # Groups:   Species [3]
       Sepal.Length[,1] Sepal.Width[,1] Petal.Length[,1] Petal.Width[,1] Species
                  <dbl>           <dbl>            <dbl>           <dbl> <fct>  
     1           0.267           0.190            -0.357          -0.436 setosa 
     2          -0.301          -1.13             -0.357          -0.436 setosa 
     3          -0.868          -0.601            -0.933          -0.436 setosa 
     4          -1.15           -0.865             0.219          -0.436 setosa 
     5          -0.0170          0.454            -0.357          -0.436 setosa 
     6           1.12            1.25              1.37            1.46  setosa 
     7          -1.15           -0.0739           -0.357           0.512 setosa 
     8          -0.0170         -0.0739            0.219          -0.436 setosa 
     9          -1.72           -1.39             -0.357          -0.436 setosa 
    10          -0.301          -0.865             0.219          -1.39  setosa 
    # ... with 140 more rows