I have been stymied by an error that traces back to predict.lme, running inside a function, failing to interpret a formula based on a variable that has been passed from outside the function. I know the issue has to do with variable scope and different environments, but I've been unable to fully understand it or find a workaround. Your help would be much appreciated.
Here's a reproducible example:
# This will be the nested function.
train_test_perf <- function(train_data, test_data, model, termLabels) {
fixForm <- reformulate(termlabels=termLabels, response="Y")
fit <- nlme::lme(fixForm, data=train_data, random=~ 1|ID)
train_pred <- predict(fit, newdata=train_data, level=0, na.action=na.exclude)
rtrain <- cor.test(train_data$Y, train_pred)
test_pred <- predict(fit, newdata=test_data, level=0, na.action=na.exclude)
rtest <- cor.test(test_data$Y, test_pred)
tmp <- data.frame(Model=model,
R_train=rtrain$estimate,
R_test=rtest$estimate)
return(tmp)
}
# And here is the function that calls it.
myfunc <- function(df, newdf, varList) {
for (v in varList) {
perf <- train_test_perf(train_data=df, test_data=newdf, model=v, termLabels=v)
print(perf)
}
}
# The outer function call.
myfunc(df=dat, newdf=newdat, varList=list("W", "X"))
Running this gives the following error and traceback:
Error in eval(mCall$fixed) : object 'fixForm' not found
7.
eval(mCall$fixed)
6.
eval(mCall$fixed)
5.
eval(eval(mCall$fixed)[-2])
4.
predict.lme(fit, newdata = train_data, level = 0, na.action = na.exclude)
3.
predict(fit, newdata = train_data, level = 0, na.action = na.exclude)
2.
train_test_perf(train_data = df, test_data = newdf, model = v,
termLabels = v)
1.
myfunc(df = dat, newdf = newdat, varList = list("W", "X"))
It seems clear that predict.lme does not have access to the fixForm variable, but I haven't been able to work out a way to both define a formula based on a variable and have the value accessible to predict.lme. I'm not sure whether the nested function structure is part of the problem here--if it is, I would prefer to find a workaround that would maintain this structure, as my real-life code includes some other things inside myfunc that occur before and after the call to train_test_perf.
Thanks,
Jeff Phillips
Using a variable as formula doesn't stores the variable not the formula which might be the issue. We can use a do.call
.
train_test_perf <- function(train_data, test_data, model, termLabels) {
fixForm <- reformulate(termlabels=termLabels, response="Y")
fit <- do.call(nlme::lme, list(fixForm, data=quote(train_data), random=~ 1|ID))
train_pred <- predict(fit, newdata=train_data, level=0, na.action=na.exclude)
rtrain <- cor.test(train_data$Y, train_pred)
test_pred <- predict(fit, newdata=test_data, level=0, na.action=na.exclude)
rtest <- cor.test(test_data$Y, test_pred)
tmp <- data.frame(Model=model, R_train=rtrain$estimate,
R_test=rtest$estimate)
return(tmp)
}
Finally put it in an sapply
to avoid tedious for
loops.
t(sapply(c("W", "X"), \(x) train_test_perf(train_data=dat, test_data=newdat, model=x, termLabels=x)))
# Model R_train R_test
# [1,] "W" 0.1686495 -0.001738604
# [2,] "X" 0.4138526 0.2992374