Search code examples
rregressionlinear-regressionlatticeabline

How to calculate confidence bands for fitted values of a linear regression by hand and plot them?


I'm trying to plot error bands around a linear regression. I'm working with the builtin trees dataset in R. Here's my code. No lines are showing up on the plot. Please help!

xdev <- log(Volume) - mean(log(Volume))
xdev
ydev <- Girth - mean(Girth)
ydev
b1 <- sum((xdev)*(ydev))/sum(xdev^2)
b0 <- mean(Girth) - mean(log(Volume))*b1
plot(log(Volume) ~ Girth)
abline(coef(lm(log(Volume) ~ Girth)), col=2)
(paste(b0, b1))
y.hat <- b0 + b1*log(Volume)
ss.explained <- sum((y.hat - mean(Girth))^2)
ss.unexplained <- sum((y.hat - Girth)^2)
stderr.b1 <- sqrt((ss.unexplained/29)/sum(xdev^2))
stderr.b1

std.yhat <- sqrt((ss.unexplained/29)*((1/31) + (xdev^2/sum(xdev^2))))
std.yhat
upp <-  y.hat + qt(0.95, 29)*std.yhat
low <-  y.hat - qt(0.95, 29)*std.yhat
upp
low
library(lattice)
plot(log(Volume) ~ Girth, data = trees)
abline(c(b0, b1), lty=1)
points(upp ~ Girth, data=trees, type="l", lty=2)
points(low ~ Girth, data=trees, type="l", lty=2)  

the graph resulting from this code


Solution

  • Obviously you want to calculate OLS by hand as well as predictions with confidence bands. Since I'm not familiar with the approach you are using (but you definitely mixed up Y and X in the beginning of your code), I show you how to get the confidence bands with a similar manual approach I'm more familiar with.

    data(trees)  ## load trees dataset
    
    ## fit
    X <- cbind(1, trees$Girth)
    y <- log(trees$Volume)
    beta <- solve(t(X) %*% X) %*% t(X) %*% y
    y.hat <- as.vector(X %*% beta)
    
    ## std. err. y.hat
    res <- y - y.hat
    n <- nrow(trees); k <- ncol(X)
    VCV <- 1/(n - k)*as.vector(t(res) %*% res)*solve(t(X) %*% X)
    se.yhat <- sqrt(diag(X %*% VCV %*% t(X)))
    
    ## CIs
    tcrit <- qt(0.975, n - k)
    upp <- y.hat + tcrit*se.yhat
    low <- y.hat - tcrit*se.yhat
    
    ## plot
    with(trees, plot(x=Girth, y=log(Volume)))
    abline(a=beta, col=2)
    lines(x=trees$Girth, y=upp, lty=2)
    lines(x=trees$Girth, y=low, lty=2)
    

    enter image description here

    We can compare this with the results of respective R functions, which will give us the same values and, thus, the same plot as above.

    fit <- lm(log(Volume) ~ Girth, trees)
    pred <- stats::predict.lm(fit, se.fit=TRUE, df=fit$df.residual)
    
    with(trees, plot(x=Girth, y=log(Volume)))
    abline(fit, col=2)
    lines(x=trees$Girth, y=pred$fit + tcrit*pred$se.fit, lty=2)
    lines(x=trees$Girth, y=pred$fit - tcrit*pred$se.fit, lty=2)
    

    Since you have noted lattice, you can further crosscheck using lattice::xyplot:

    lattice::xyplot(fit$fitted.values ~ log(trees$Volume), 
                    panel=mosaic::panel.lmbands)
    

    enter image description here

    Note: It seems you are using attach, which is discouraged in the community, see: Why is it not advisable to use attach() in R, and what should I use instead?. I therefore used with() and `$`() instead.