Search code examples
rscopenestedpredict

predict.lme is unable to interpret a formula defined from a variable


I have been stymied by an error that traces back to predict.lme, running inside a function, failing to interpret a formula based on a variable that has been passed from outside the function. I know the issue has to do with variable scope and different environments, but I've been unable to fully understand it or find a workaround. Your help would be much appreciated.

Here's a reproducible example:

# This will be the nested function.
train_test_perf <- function(train_data, test_data, model, termLabels) {
  fixForm <- reformulate(termlabels=termLabels, response="Y")
  fit <- nlme::lme(fixForm, data=train_data, random=~ 1|ID)
  train_pred <- predict(fit, newdata=train_data, level=0, na.action=na.exclude)
  rtrain <- cor.test(train_data$Y, train_pred)
  test_pred <- predict(fit, newdata=test_data, level=0, na.action=na.exclude)
  rtest <- cor.test(test_data$Y, test_pred)
  tmp <- data.frame(Model=model, 
                    R_train=rtrain$estimate, 
                    R_test=rtest$estimate)
  return(tmp)
}

# And here is the function that calls it.
myfunc <- function(df, newdf, varList) {
  for (v in varList) {
    perf <- train_test_perf(train_data=df, test_data=newdf, model=v, termLabels=v)
    print(perf)
  }
}

# The outer function call.
myfunc(df=dat, newdf=newdat, varList=list("W", "X"))

Running this gives the following error and traceback:

Error in eval(mCall$fixed) : object 'fixForm' not found
7.
eval(mCall$fixed)
6.
eval(mCall$fixed)
5.
eval(eval(mCall$fixed)[-2])
4.
predict.lme(fit, newdata = train_data, level = 0, na.action = na.exclude)
3.
predict(fit, newdata = train_data, level = 0, na.action = na.exclude)
2.
train_test_perf(train_data = df, test_data = newdf, model = v, 
termLabels = v)
1.
myfunc(df = dat, newdf = newdat, varList = list("W", "X"))

It seems clear that predict.lme does not have access to the fixForm variable, but I haven't been able to work out a way to both define a formula based on a variable and have the value accessible to predict.lme. I'm not sure whether the nested function structure is part of the problem here--if it is, I would prefer to find a workaround that would maintain this structure, as my real-life code includes some other things inside myfunc that occur before and after the call to train_test_perf.

Thanks,

Jeff Phillips


Solution

  • Using a variable as formula doesn't stores the variable not the formula which might be the issue. We can use a do.call.

    train_test_perf <- function(train_data, test_data, model, termLabels) {
      fixForm <- reformulate(termlabels=termLabels, response="Y")
      fit <- do.call(nlme::lme, list(fixForm, data=quote(train_data), random=~ 1|ID))
      train_pred <- predict(fit, newdata=train_data, level=0, na.action=na.exclude)
      rtrain <- cor.test(train_data$Y, train_pred)
      test_pred <- predict(fit, newdata=test_data, level=0, na.action=na.exclude)
      rtest <- cor.test(test_data$Y, test_pred)
      tmp <- data.frame(Model=model, R_train=rtrain$estimate, 
                        R_test=rtest$estimate)
      return(tmp)
    }
    

    Finally put it in an sapply to avoid tedious for loops.

    t(sapply(c("W", "X"), \(x) train_test_perf(train_data=dat, test_data=newdat, model=x, termLabels=x)))
    #      Model R_train   R_test      
    # [1,] "W"   0.1686495 -0.001738604
    # [2,] "X"   0.4138526 0.2992374