I spilt the data set into train and test as following:
splitdata<-split(sb[1:nrow(sb),], sample(rep(1:2, as.integer(nrow(sb)/2))))
test<-splitdata[[1]]
train<-rbind(splitdata[[2]])
sb is the name of original data set, so it is 50/50 train and test.
Then I fitted a glm using the training set.
fitglm<- glm(num_claims~year+vt+va+public+pri_bil+persist+penalty_pts+num_veh+num_drivers+married+gender+driver_age+credit+col_ded+car_den, family=poisson, train)
now I want to predict using this glm, say the next 10 observations.
I have trouble to specify the newdata in predict(),
I tried:
pred<-predict(fitglm,newdata=data.frame(train),type="response", se.fit=T)
this will give a number of predictions that is equal to the number of samples in training set.
and finally, how to plot these predictions with confidence intervals?
Thank you for the help
If you are asking how to construct predictions on the next 10 in the test set then:
pred10<-predict(fitglm,newdata=data.frame(test)[1:10, ], type="response", se.fit=T)
Edit 9 years later:
@carsten's comment is correct regarding how to construct a confidence interval. If one has a non-linear link function for a glm-object, fitglm
then this is a reasonably general method to recover the inverse of the link function and construct a two-sided 95% CI on the response scale:
pred.fit <- predict(fitglm, newdata=newdata, se.fit=TRUE)
pred.fit <- predict(fitglm, newdata=newdata, se.fit=TRUE)
CI.pred.upper <- family(fitglm)$linkinv( # that information is in the model
pred.fit+ 1.96*pred.fit$se.fit )
CI.pred.lower <- family(fitglm)$linkinv( # that information is in the model
pred.fit$fit - 1.96*pred.fit$se.fit )