Search code examples
rregressionstandard-deviationdummy-variable

Linear regression with independent variable plus 1 Standard Deviation


This must be a really simple question though Im not sure if I do it correctly:

I want to perform a multiple lineair regression where I want to include the effect of an independent variable (Indv3) change in 1 standard deviation (SD)

In other words: if 'Indv3' changes 1SD, how is the dependent (Depv) variable associated to it?

What I did was: calculate the SD-value of 'Indv3' and make a dummy variable (Indv3_plusSD) with 'Indv3' + 1SD-value = 1 and the rest gets value 0.

Then to do the lineair regression I add the 'Indv3_plusSD' dummy and execute the regression. However when I do this I get another beta-coefficient for the 'Depv' compared to an analysis with the same data already published in a paper...(so prob Im doing it wrong with the SD analysis :)

       Depv      Indv1 Indv2   Indv3    Indv3_plusSD
1   1.1555864       48    1  77.07593       0
2   1.0596864       61    2  69.51333       0
3   0.8380413       51    1  87.38040       0
4   1.5305489       53    2  67.43750       0
5   1.0619884       55    1 165.99977       1
6   0.8474507       56    2 229.14570       1
7   0.9579580       64    2 121.89550       0
8   0.7432210       58    1 211.17690       1
9   0.8374197       60    1 139.69577       0
10  0.7378349       65    1 277.03920       1
11  0.6971632       61    1 195.72100       1
12  0.5227076       64    2 194.63220       1
13  0.9900380       52    1 138.25417       0
14  0.8954233       52    2 237.39020       1
15  0.9058147       56    1 123.42930       0
16  0.9436135       55    2 152.75953       1
17  0.7123374       55    1 190.34547       1
18  1.1928167       58    1 166.50990       1
19  1.3342048       47    2  76.35120       0
20  1.0881865       49    1 135.71740       0
21  2.9028876       48    2  61.83147       0
22  0.6661121       61    1 139.68627       0

linregr <- lm(Depv ~ Indv1 + Indv2 + Indv3_plusSD, data = df)   

Solution

  • Regress against Indv1, Indv2 and Indv3 without your SD term:
    linregr <- lm(Depv ~ Indv1 + Indv2 + Indv3, data = df)

    The regression coefficient for Indv3 is the amount Depv is predicted to change for a unit change in Indv3, so the amount Depv will change for a change of 1 SD in Indv3 is SD * (coefficient of Indv3).

    library(tidyverse)
    df = read_table2('Depv      Indv1 Indv2   Indv3
    1.1555864       48    1  77.07593
    1.0596864       61    2  69.51333
    0.8380413       51    1  87.38040
    1.5305489       53    2  67.43750
    1.0619884       55    1 165.99977
    0.8474507       56    2 229.14570
    0.9579580       64    2 121.89550
    0.7432210       58    1 211.17690
    0.8374197       60    1 139.69577
    0.7378349       65    1 277.03920
    0.6971632       61    1 195.72100
    0.5227076       64    2 194.63220
    0.9900380       52    1 138.25417
    0.8954233       52    2 237.39020
    0.9058147       56    1 123.42930
    0.9436135       55    2 152.75953
    0.7123374       55    1 190.34547
    1.1928167       58    1 166.50990
    1.3342048       47    2  76.35120
    1.0881865       49    1 135.71740
    2.9028876       48    2  61.83147
    0.6661121       61    1 139.68627') %>% 
      mutate(Indv3_scale = scale(Indv3))
    
    (sd3 = sd(df$Indv3))
    #> [1] 60.84117
    
    model1 =  lm(Depv ~ Indv1 + Indv2 + Indv3, data = df)   
    model2 =  lm(Depv ~ Indv1 + Indv2 + Indv3_scale, data = df)   
    
    coef(model1)['Indv3'] * sd3
    #>      Indv3 
    #> -0.1609104
    coef(model2)['Indv3_scale']
    #> Indv3_scale 
    #>  -0.1609104
    

    Created on 2020-01-14 by the reprex package (v0.3.0)