Search code examples
rlinear-regressionprediction

Individual terms in prediction of linear regression


I performed a regression analyses in R on some dataset and try to predict the contribution of each individual independent variable on the dependent variable for each row in the dataset.

So something like this:

set.seed(123)                                              
y <- rnorm(10)                                           
m <- data.frame(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10))
regr <- lm(formula=y~v1+v2+v3, data=m)  
summary(regr)
terms <- predict.lm(regr,m, type="terms")

In short: run a regression and use the predict function to calculate the terms of v1,v2 and v3 in dataset m. But I am having a hard time understanding what the predict function is calculating. I would expect it multiplies the coefficient of the regression result with the variable data. So something like this for v1:

coefficients(regr)[2]*m$v1

But that gives different results compared to the predict function.

Own calculation:

0.55293884  0.16253411  0.18103537  0.04999729 -0.25108302  0.80717945  0.22488764 -0.88835486  0.31681455 -0.21356803

And predict function calculation:

0.45870070  0.06829597  0.08679724 -0.04424084 -0.34532115  0.71294132  0.13064950 -0.98259299  0.22257641 -0.30780616

The prediciton function is of by 0.1 or so Also if you add all terms in the prediction function together with the constant it doesn’t add up to the total prediction (using type=”response”). What does the prediction function calculate here and how can I tell it to calculate what I did with coefficients(regr)[2]*m$v1?


Solution

  • All the following lines result in the same predictions:

    # our computed predictions
    coefficients(regr)[1] + coefficients(regr)[2]*m$v1 +
      coefficients(regr)[3]*m$v2 + coefficients(regr)[4]*m$v3
    
    # prediction using predict function
    predict.lm(regr,m)
    
    # prediction using terms matrix, note that we have to add the constant.
    terms_predict = predict.lm(regr,m, type="terms")
    terms_predict[,1]+terms_predict[,2]+terms_predict[,3]+attr(terms_predict,'constant')
    

    You can read more about using type="terms" here.

    The reason that your own calculation (coefficients(regr)[2]*m$v1) and the predict function calculation (terms_predict[,1]) are different is because the columns in the terms matrix are centered around the mean, so their mean becomes zero:

    # this is equal to terms_predict[,1]
    coefficients(regr)[2]*m$v1-mean(coefficients(regr)[2]*m$v1)
    
    # indeed, all columns are centered; i.e. have a mean of 0.
    round(sapply(as.data.frame(terms_predict),mean),10)
    

    Hope this helps.