Search code examples
rsplinesurveybspline

R. Incorrect prediction using survey, splines and only one value of the independent variable


When I try to get a prediction for just one value of the independent variable using this syntax:

library(survey)
library(splines)
data(api)
dclus <- svydesign(id=~dnum,data=apiclus2)
log<-svyglm(api99 ~  bs(ell,degree=1, knots =c(14,23)) , dclus)
data <- data.frame(ell = 0)
data <- cbind(data, predict(log, newdata=data))
data <- data.frame(ell = 15)
data <- cbind(data, predict(log, newdata=data))

I get always the same prediction:

#link=591.0929

This does not happen if I only use survey or spline or if I create a data frame with a list of independent values:

data<-data.frame(ell = rep(seq(from = 0, to = 66)))
data <- cbind(data, predict(log, newdata=data))

Curiously enough, in this last data frame link=591.0929 corresponds to ell=23


Solution

  • The issue here is that the bs() term doesn't fully specify the basis -- it also uses the range of the predictor to work out boundary knots. With only one point, the way it does this doesn't work.

    A work-around is to specify the boundary knots, eg,

    > log<-svyglm(api99 ~  bs(ell,degree=1, knots =c(14,23), Boundary.knots=c(0,100)) , dclus)
    > data <- data.frame(ell = 0)
    >  predict(log, newdata=data)
        link     SE
    1 787.64 27.162
    > data2 <- data.frame(ell = 15)
    >  predict(log, newdata=data2)
        link     SE
    1 627.76 34.108
    

    It looks as though predict.lm has some complicated stuff to stop this happening that wasn't there when predict.svyglm was written.

    I'll pass this on to the package maintainer ;-)