Search code examples
rglm

Why does a predict.glm() not create predicted values in the expected manner?


I am trying to get my head around what the predict.glm() function does for a project at work which uses it.

To do this, I first looked at the example code found in the documentation for ?predict.glm(). This has given me the sense that it can take a glm and predict response values for a given input vector. However I found it very difficult to customise that "budworm" example. So I created an exceptionally simply model of my own to try and see how it works. Spoiler- I'm still failing to get it to work.

a<-c(1,2,3,4,5)
b<-c(2,3,4,5,6)
result<-glm(b~a,family=gaussian)
summary(result)
plot(c(0,10), c(0,10), type = "n", xlab = "dose",
     ylab = "response")
xvals<-seq(0,10,0.1)
data.frame(xinputs=xvals)
predict.glm(object=result,newdata= data.frame(xinputs=xvals),type='terms')
#lines(xvals, predict.glm(object=result,newdata = xvals, type="response" ))

When I run predict.glm(object=result,newdata= data.frame(xinputs=xvals),type='terms') I get the error message:

Warning message:
'newdata' had 101 rows but variables found have 5 rows

From what I understand, it shouldn't matter that the input GLM only used 5 rows... it should use the statistics of that GLM to predict response values to each of the 101 entries of the new data?


Solution

  • Column names in the newdata data frame must match column names from the data you used to fit the model. Thus,

    predict.glm(object=result,newdata= data.frame(a=xvals),type='terms')
    

    will resolve your issue.

    a <- c(1, 2, 3, 4, 5)
    b <- c(2, 3, 4, 5, 6)
    result <- glm(b ~ a, family = gaussian)
    summary(result)
    #> 
    #> Call:
    #> glm(formula = b ~ a, family = gaussian)
    #> 
    #> Deviance Residuals: 
    #>          1           2           3           4           5  
    #> -1.776e-15  -8.882e-16  -8.882e-16   0.000e+00   0.000e+00  
    #> 
    #> Coefficients:
    #>              Estimate Std. Error   t value Pr(>|t|)    
    #> (Intercept) 1.000e+00  1.317e-15 7.591e+14   <2e-16 ***
    #> a           1.000e+00  3.972e-16 2.518e+15   <2e-16 ***
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> (Dispersion parameter for gaussian family taken to be 1.577722e-30)
    #> 
    #>     Null deviance: 1.0000e+01  on 4  degrees of freedom
    #> Residual deviance: 4.7332e-30  on 3  degrees of freedom
    #> AIC: -325.47
    #> 
    #> Number of Fisher Scoring iterations: 1
    plot(c(0, 10),
         c(0, 10),
         type = "n",
         xlab = "dose",
         ylab = "response")
    

    xvals <- seq(0, 10, 0.1)
    head(data.frame(xinputs = xvals))
    #>   xinputs
    #> 1     0.0
    #> 2     0.1
    #> 3     0.2
    #> 4     0.3
    #> 5     0.4
    #> 6     0.5
    head(predict.glm(object = result,
                newdata = data.frame(a = xvals),
                type = 'terms'))
    #>      a
    #> 1 -3.0
    #> 2 -2.9
    #> 3 -2.8
    #> 4 -2.7
    #> 5 -2.6
    #> 6 -2.5
    

    Created on 2020-09-15 by the reprex package (v0.3.0)