Search code examples
rlogistic-regression

How to use scale in logistic regression correctly


I tried to use scale within a logistic regression model, But I don't see any changes in the results in comparison to the original model.Is it my mistake? Here is an example code:

 dat <- read.table(text = " cats birds    wolfs     snakes
           0        3        9         7
           1        3        8         4
           1        1        2         8
           0        1        2         3
           0        1        8         3
           1        6        1         2
           0        6        7         1
           1        6        1         5
           0        5        9         7
           1        3        8         7
           1        4        2         7
           0        1        2         3
           0        7        6         3
           1        6        1         1
           0        6        3         9
           1        6        1         1   ",header = TRUE)

Original regression:

 dat_glm<-glm(cats~birds+    wolfs +    snakes,data=dat,family=binomial(link="logit"))
 dat$dat_glm_pred_response<-ifelse(predict(dat_glm,newdata=dat,type='response')>0.5,1,0)
 m<-xtabs(~cats+dat_glm_pred_response,data=dat);m;prop.table(m,2);prop.table(m,1)

Original regression output:

   dat_glm_pred_response
cats 0 1
   0 5 3
   1 2 6
    dat_glm_pred_response
cats         0         1
   0 0.7142857 0.3333333
   1 0.2857143 0.6666667
    dat_glm_pred_response
cats     0     1
   0 0.625 0.375
   1 0.250 0.750

I used the scale function to see if it can help in gaining more accuracy:

dat_glm_scale<-glm(cats ~    scale(birds) + scale(wolfs) + scale(snakes),data=dat,family=binomial(link="logit"))

However I got the same results:

 dat$dat_glm_pred_response1<-ifelse(predict(dat_glm_scale,newdata=dat,type='response')>0.5,1,0)
 m<-xtabs(~cats+dat_glm_pred_response1,data=dat);m;prop.table(m,2);prop.table(m,1)

Scaled data results:

   dat_glm_pred_response1
cats 0 1
   0 5 3
   1 2 6
    dat_glm_pred_response1
cats         0         1
   0 0.7142857 0.3333333
   1 0.2857143 0.6666667
    dat_glm_pred_response1
cats     0     1
   0 0.625 0.375
   1 0.250 0.750

Why are the two results the same?Any Idea?


Solution

  • Scaling/centering in this manner will lead to changes in the resulting coefficients and SE of your model, which is indeed the case in your example. However, as long as you don't have any interaction terms in your model, you would not expect changes in the prediction.

    You can see this when you compare the full summary output of the models:

     summary(dat_glm)
     summary(dat_glm_scale)
    

    In answer to your main question: There is nothing wrong with your code and the scaling, but you should not expect to see changes in the predictions.

    Edit: The following questions on Stackexchange give more details on the subject: https://stats.stackexchange.com/questions/65898/why-could-centering-independent-variables-change-the-main-effects-with-moderatio

    https://stats.stackexchange.com/questions/29781/when-should-you-center-your-data-when-should-you-standardize