r regression mixed-models confidence-interval nlme

Confidence intervals on predictions using predictor variables for a non-linear mixed effects model (nlme)

I am trying to produce 95% confidence intervals for a non-linear mixed effects model using the bootstrap method described by @BenBolker here. I may be misunderstanding some of the functions used. My goal is to simulate the 95% confidence intervals for each level of the predictor (in this example case, year).

Below is reproducible code using the FlexParamCurve package's dataset penguin.data. To be clear, the code provided is a modified version of Dr. Bolker's code he provided in his answer to the linked question above.

library(FlexParamCurve) #also loads package 'nlme' which is needed.
library(ggplot2)

set.seed(1234)

##creating model
fm2 <- nlme(weight ~ SSlogis(ckage, Asym, R0, lrc),
            data = penguin.data,
            fixed= list(Asym ~ year,
                        R0 ~ year,
                        lrc ~ year),
            random = Asym ~ 1,
            start = c(Asym = 1000, 0,
                      R0 = 21, 0,
                      lrc = 1, 0),
            control = list(maxIter = 100),
            na.action = na.pass)

#created simulated x ('ckage') values to use in prediction below
xvals.peng <- with(penguin.data,seq(min(ckage),max(ckage),length.out=100))

nresamp <- 100

## utility function
get_CI <- function(y,pref = "") {
  r1 <- t(apply(y , 1 , quantile , c(0.025 , 0.975)))
  setNames(as.data.frame(r1) , paste0(pref , c("lwr" , "upr"))) #function to get CI 
}

##creating the data frame to use for predictions
pengframe <- with(penguin.data, data.frame(ckage = xvals.peng))

##Tried to use for weight predictions and it did not work
pengframe$weight <- predict(fm2,newdata=pengframe,level=0)


Error in eval(predvars, data, env) : object 'year' not found

This did not work because the fixed effect 'year' that is in the model was missing from pengframe and thus I was not able to use the predict() function. This makes total sense, so I tried a workaround using the rbind() function:

###this is where I created the column (year) and its values separately before rbinding the data frames. There are only two levels in year.
pengframe1 <- with(penguin.data,data.frame(ckage=xvals.peng))
pengframe1$year <- as.factor('2000')
pengframe2 <- with(penguin.data,data.frame(ckage=xvals.peng))
pengframe2$year <- as.factor('2002')

pengframe <- rbind(pengframe1, pengframe2)

#predicting weight for each year now works
pengframe$weight <- predict(fm2,newdata=pengframe,level=0)

head(pengframe)


sampfun <- function(fitted,data,idvar="bandid") {
  pp <- predict(fitted,levels=1)
  rr <- residuals(fitted)
  dd <- data.frame(data,pred=pp,res=rr)
  ## sample groups with replacement
  iv <- levels(data[[idvar]])
  bsamp1 <- sample(iv,size=length(iv),replace=TRUE)
  bsamp2 <- lapply(bsamp1,
                   function(x) {
                     ## within groups, sample *residuals* with replacement
                     ddb <- dd[dd[[idvar]]==x,]
                     ## bootstrapped response = pred + bootstrapped residual
                     ddb$height <- ddb$pred +
                       sample(ddb$res,size=nrow(ddb),replace=TRUE)
                     return(ddb)
                   })
  res <- do.call(rbind,bsamp2)  ## collect results
  if (is(data,"groupedData"))
    res <- groupedData(res,formula=formula(data))
  return(res)
}

pfun <- function(fm) {
  predict(fm,newdata=pengframe,level=0)
}

yvals2 <- replicate(nresamp, 
                    pfun(update(fm2, 
                                data = sampfun(fm2, 
                                               penguin.data, 
                                               "bandid"))))

peng2 <- get_CI(yvals2,"boot_")
head(peng2)
pengframe <- data.frame(pengframe,peng2)
head(pengframe)

ggplot(pengframe, aes(ckage, weight, color = year)) + 
  geom_smooth() + #this is for simplicity purposes only. I use geom_func() in my real dataset
  geom_ribbon(pengframe, mapping = aes(x = ckage, ymin = boot_lwr, ymax =boot_upr, group=year, fill = year), alpha = 0.3)

This method provided me with 95% confidence intervals for each year by using the same estimated 'ckage' as the other and is my expected result.

I wanted to confirm if this way is statistically sound? I suspect this method is okay, but I'm only beginning to get the hang of non-linear mixed models. I also wanted to ask if there is a more direct approach [i.e. adding it to the with() function where the 'ckage' is initially simulated in when the xvals.peng is first created to simplify the process]. I will be using sex instead of years, and I will have nested random factors (1 | group/id) which is probably a different question altogether.

Solution

Looks basically fine to me. But you can have your newdata cheaper using expand.grid(). Also, you can keep your workspace cleaner: actually you only need one new data.frame newdata.peng that you can expand.

> ## create newdata
> newdata.peng <- with(penguin.data, 
+                      expand.grid(
+                        ckage=seq(min(ckage), max(ckage), length.out=100), 
+                        year=unique(year)
+                      ))
> 
> 
> ## add predicted weight to newdata
> newdata.peng$weight <- predict(fm2, newdata=newdata.peng, level=0)
> 
> 
> ## bootstrap
> nresamp <- 100
> set.seed(1234)
> yvals2 <- 
+   replicate(nresamp, {
+     pfun(update(fm2, data=sampfun(fm2, penguin.data, "bandid")))
+   })

Perhaps you need an order of magnitude more replications. Also it runs forever, try parallel::parSapply or better parallel::mclapply if you're on Linux.

> ## add CI weight to newdata
> newdata.peng <- cbind(newdata.peng, get_CI(yvals2, "boot_"))

Not sure why you want to smooth something or similar when plotting. I think you have clear data that no longer needs to be post-processed. Here a way w/o ggplot.

> ## plot
> ci_cols <- c('boot_lwr', 'weight', 'boot_upr')
> years <- sort(as.integer(as.character(unique(newdata.peng$year))))
> 
> par(mar=c(4, 4, 2, 2) + .1)
> plot.new(); plot.window(xlim=range(newdata.peng$ckage), 
+                         ylim=range(newdata.peng[ci_cols]))
> for (i in 1:2) {
+   axis(i, axTicks(i))
+   mtext(c('chick age (days)', 'chick mass (g)')[i], i, 3) 
+   }
> for (i in (seq_along(years))) {
+   matlines(
+     subset(newdata.peng, year == years[i], select=ci_cols), 
+     col=i + 1L, lty=2:1)
+ }
> box()
> legend('bottomright', legend=c(years, '95% CI'), col=c(2:3, 8), lty=c(1, 1, 2))

Data:

> data('penguin.data', package='FlexParamCurve')
> library(nlme)
> fm2 <- nlme(weight ~ SSlogis(ckage, Asym, R0, lrc), data=penguin.data, 
+             fixed=list(Asym ~ year, R0 ~ year, lrc ~ year), 
+             random=Asym ~ 1, start=c(Asym=1000, 0, R0=21, 0, lrc=1, 0), 
+             control=list(maxIter=100), na.action=na.pass)

Functions taken from @BenBolker's post.