Search code examples
rpredictionglmconfidence-interval

GLM prediction in r


I spilt the data set into train and test as following:

splitdata<-split(sb[1:nrow(sb),], sample(rep(1:2, as.integer(nrow(sb)/2))))
test<-splitdata[[1]]
train<-rbind(splitdata[[2]])

sb is the name of original data set, so it is 50/50 train and test.

Then I fitted a glm using the training set.

fitglm<-  glm(num_claims~year+vt+va+public+pri_bil+persist+penalty_pts+num_veh+num_drivers+married+gender+driver_age+credit+col_ded+car_den, family=poisson, train)

now I want to predict using this glm, say the next 10 observations.

I have trouble to specify the newdata in predict(),

I tried:

pred<-predict(fitglm,newdata=data.frame(train),type="response", se.fit=T)

this will give a number of predictions that is equal to the number of samples in training set.

and finally, how to plot these predictions with confidence intervals?

Thank you for the help


Solution

  • If you are asking how to construct predictions on the next 10 in the test set then:

    pred10<-predict(fitglm,newdata=data.frame(test)[1:10, ], type="response", se.fit=T) 
    

    Edit 9 years later:

    @carsten's comment is correct regarding how to construct a confidence interval. If one has a non-linear link function for a glm-object, fitglm then this is a reasonably general method to recover the inverse of the link function and construct a two-sided 95% CI on the response scale:

    pred.fit <- predict(fitglm, newdata=newdata, se.fit=TRUE)
    pred.fit <- predict(fitglm, newdata=newdata, se.fit=TRUE)
    CI.pred.upper <- family(fitglm)$linkinv(  # that information is in the model 
                            pred.fit+  1.96*pred.fit$se.fit )
    
    CI.pred.lower <- family(fitglm)$linkinv(  # that information is in the model
                            pred.fit$fit - 1.96*pred.fit$se.fit )