Search code examples
rdplyrpsych

Rescaled certain columns to specific mean and standard deviation in R


Given a dataframe as follows, how can I rescale v5 so that the mean is 100 and the standard deviation is 15?

head(df, n=5)

Out:

v1  v2    v3   v4   v5 
65  1   121.12  4   27
98  1   89.36   4   25
85  1   115.44  4   27
83  1   99.45   3   25
115 1   92.75   4   27
98  0   107.90  1   18

I have tried with psych package but final df is not correct for last column:

library(psych)
library(tidyverse)
v5.rescaled <- df %>% rescale(df$v5, mean = 100, sd = 15)
df$v5.rescaled

Out:

t.t.scale.x.....sd...mean.
121.11985               
89.35994                
115.43986               
99.44991                
92.74993                

But head(df, n=5) is not correct for rescaled v5:

    v1  v2     v3   v4  v5        v5.rescaled
1   65  1   121.12  4   27  <data.frame [5 × 1]>
2   98  1   89.36   4   25  <data.frame [5 × 1]>
3   85  1   115.44  4   27  <data.frame [5 × 1]>
4   83  1   99.45   3   25  <data.frame [5 × 1]>
5   115 1   92.75   4   27  <data.frame [5 × 1]>

Solution

    1. Please try to post a valid reprex next time. This will save others the trouble of having to manually reproduce your input data. Also, it is not immediately clear how your first code chunk referring to a df with columns v1 - v5 relates to the subsequent code chunk referring to df$mother.iq.
    2. The help file for psych::rescale() specifically states that the input, x, should be a matrix or data frame. I suspect this is why the output you get is not what you were expecting.
    3. While you can use psych::rescale(), a better alternative that offers more flexibility may be to forego the additional dependency on the {psych} package altogether and, instead, simply manually rescale the columns as required. The two approaches are illustrated in the reprex below:
    # load libraries
    library(tidyverse)
    
    # define data as per OP
    df <- data.frame(
              v1 = c(65L, 98L, 85L, 83L, 115L, 98L),
              v2 = c(1L, 1L, 1L, 1L, 1L, 0L),
              v3 = c(121.12, 89.36, 115.44, 99.45, 92.75, 107.9),
              v4 = c(4L, 4L, 4L, 3L, 4L, 1L),
              v5 = c(27L, 25L, 27L, 25L, 27L, 18L)
    )
    
    # rescale via psych::rescale using entire data frame
    df %>% psych::rescale(mean = 100, sd = 15)
    #>          v1        v2        v3        v4        v5
    #> 1  77.38682 106.12372 119.90143 108.25723 109.31746
    #> 2 106.46091 106.12372  82.24089 108.25723 100.71673
    #> 3  95.00748 106.12372 113.16617 108.25723 109.31746
    #> 4  93.24541 106.12372  94.20546  95.87139 100.71673
    #> 5 121.43847 106.12372  86.26070 108.25723 109.31746
    #> 6 106.46091  69.38138 104.22535  71.09970  70.61416
    
    # if you only want to do this for specific columns, do it manually by targeting
    # columns using dplyr::mutate_at(), an anonymous function, and scale (from base
    # R):
    df %>% 
      mutate_at(vars(v4, v5), function(x) scale(x)*15 + 100)
    #>    v1 v2     v3        v4        v5
    #> 1  65  1 121.12 108.25723 109.31746
    #> 2  98  1  89.36 108.25723 100.71673
    #> 3  85  1 115.44 108.25723 109.31746
    #> 4  83  1  99.45  95.87139 100.71673
    #> 5 115  1  92.75 108.25723 109.31746
    #> 6  98  0 107.90  71.09970  70.61416