Search code examples
rnlme

Using formula to predict inside of R function generates `object not found` error


Let me preface this by saying that there are similar questions to mine on Stackoverflow, but I have not seen them answered to my satisfaction, and the answers that were given don't help me with the problem I'm having. Also this is a long question but I have tried to make each part simple and easy to understand.

Here is a proof of concept that you can assign formulas to variables in the global environment, and pass the formula variable to the lm function and use predict to make predictions. I do it in several ways to be thorough:

fake_data_1 <- data.frame(
  ecks = c(-19:20,-19:20,-19:20), 
  why = c((-19:20)^2, (-19:20)^3/40, abs(-19:20))
)

fake_data_2 <- data.frame(
  ecks =runif(22) 
)

#using basic formula
formula_used <- why ~ ecks 
lm_model <- lm(formula = formula_used, data = fake_data_1)
predict(lm_model, newdata = fake_data_2)


#converting string to formula
formula_used <- as.formula("why ~ ecks")
lm_model <- lm(formula = formula_used, data = fake_data_1)
predict(lm_model, newdata = fake_data_2)


#can use a basic string as well
formula_used <- "why ~ ecks"
lm_model <- lm(formula = formula_used, data = fake_data_1)
predict(lm_model, newdata = fake_data_2)

Here is proof of concept that it is possible to perform these processes inside of functions:

#can run this as a function
make_prediction <- function(data_in,y_var,x_var,new_data){
  formula_used <- as.formula(paste(y_var, x_var, sep = " ~ "))
  lm_model <- lm(formula = formula_used,data = data_in)
  predict(lm_model, newdata = data_in)
}
make_prediction(data_in = fake_data_1, y_var = "why", x_var = "ecks", new_data = fake_data_2)


#can explicitly set the environment of the formula: will make sense why I show this later
make_prediction_2 <- function(data_in,y_var,x_var,new_data){
  local_env = environment()
  formula_used <- as.formula(paste(y_var, x_var, sep = " ~ "),env = local_env)
  lm_model <- lm(formula = formula_used,data = data_in)
  predict(lm_model, newdata = new_data)
}

make_prediction_2(data_in = fake_data_1, y_var = "why", x_var = "ecks",new_data = fake_data_2)

As I say in the comment, it will make sense why I try the explicit assignment of environment later.

Now I am trying to use the lme function from the nlme package to make predictions. As an aside I don't understand the statistics of this function, am just using it based off of code that someone else in my lab wrote.

Here is proof of concept that you can use this function to make predictions with a formula assigned to a variable (not dealing with formula called "random" for now:

library(nlme)
#fake data for making model
fake_data_complicated_1 <- data.frame(ecks = c(-19:20,-19:20,-19:20), 
                                    why = c((-19:20)^3, (-19:20)^4/40, abs(-19:20)*100), 
                                    treatment = c(rep("a",times = 40),
                                                  rep("b", times = 40),
                                                  rep("control", times = 40)),
                                    ID = c(rep(c("q","w","e","r"),times = 10),
                                           rep(c("t","y","u","i"),times = 10),
                                           rep(c("h","j","k","l"),times = 10))
)

#fake data for making prediction
fake_data_complicated_2 <- data.frame(ecks = runif(120), 
                                      treatment = c(rep("a",times = 40),
                                                    rep("b", times = 40),
                                                    rep("control", times = 40)),
                                      ID = c(rep(c("q","w","e","r"),times = 10),
                                             rep(c("t","y","u","i"),times = 10),
                                             rep(c("h","j","k","l"),times = 10))
)

Can do it with a basic formula:

#can use basic formula as before
fixed_formula <- why ~ ecks * treatment
random_formula <- ~1|ID #not sure what this does in the model but that's not importante


lme_model <- lme(fixed = fixed_formula,
                 random = random_formula,
                 data = fake_data_complicated_1)


predict(lme_model, newdata = fake_data_complicated_2)

Can convert string to formula:

#can use a pasted/converted formula as before
fixed_formula <- as.formula(
  paste("why", paste("ecks", "treatment", sep = " * "), sep = " ~ ")
)

lme_model <- lme(fixed = fixed_formula,
                 random = random_formula,
                 data = fake_data_complicated_1)


predict(lme_model, newdata = fake_data_complicated_2)

As another aside the lme function won't take a raw string, but that's not my main problem:

#can't use a raw string, this code generates an error
# fixed_formula <-  paste("why", paste("ecks", "treatment", sep = " * "), sep = " ~ ")
# 
# 
# lme_model <- lme(fixed = fixed_formula,
#                  random = random_formula,
#                  data = fake_data_complicated_1)
# 
# 
# predict(lme_model, newdata = fake_data_complicated_2)

Here is the problem: when I try to put this lme code into a function I get an object 'xxxxx' not found error:


#this function does not work!
make_prediction_nlm <- function(data_in,y_var,x_var,treatment_var ,id_var,new_data){
  
  formula_used_nlm <- as.formula(paste(y_var, paste(x_var, treatment_var, sep = " * "), sep = " ~ "))
  random_used <-  as.formula(paste("~1|",id_var,sep = ""))
  
  lme_model <- lme(fixed = formula_used_nlm,
                   random = random_used,
                   data = data_in)
  
  predict(lme_model, newdata = new_data)
}

make_prediction_nlm(data_in = fake_data_complicated_1, 
                y_var = "why", 
                x_var = "ecks", 
                treatment_var = "treatment",
                id_var = "ID",
                new_data = fake_data_complicated_1)

Specifically the error is Error in eval(mCall$fixed) : object 'formula_used_nlm' not found

An answer here: Object not found error when passing model formula to another function suggests that as I did above, I explicitly set the environment of the formula in the function. I tried that and it did not work, generating the same error:

#neither does this one!
make_prediction_2 <- function(data_in,y_var,x_var,treatment_var ,id_var){
  local_env = environment()
  formula_used_nlm <- as.formula(paste(y_var, paste(x_var, treatment_var, sep = " * "), sep = " ~ "),
   env = local_env)
  
random_used <- as.formula(paste("~1|",id_var,sep = ""), env = local_env)
  
  lme_model <- lme(fixed = formula_used_nlm,
                   random = random_used,
                   data = data_in)
  
  predict(lme_model, newdata = data_in)
}

make_prediction_2(data_in = fake_data_complicated_1,
 y_var = "why", 
x_var = "ecks", 
treatment_var = "treatment",
id_var = "ID")

I could perhaps getting around this by using a macro instead of a function, but that's not something I want to wade into if I can help it, if it would even work. For now I will just be copying and pasting code rather than writing a function. Thanks to those of you who read through this.


Solution

  • For some reason the lme function expects a literal formula to be in the call. It does not expect to see a variable there. It uses nonstandard evaluation to try to separate the response from the fixed effect terms. In this case, it really doesn't have to do with the environment of the formula.

    The easiest way around this would be to inject the formulas into the call with do.call. This should work

    make_prediction_nlm <- function(data_in,y_var,x_var,treatment_var ,id_var,new_data){
      
      formula_used_nlm <- as.formula(paste(y_var, paste(x_var, treatment_var, sep = " * "), sep = " ~ "))
      random_used <-  as.formula(paste("~1|",id_var,sep = ""))
      
      lme_model <- do.call("lme", list(fixed = formula_used_nlm,
                       random = random_used,
                       data = quote(data_in)))
      
      predict(lme_model, newdata = new_data)
    }
    

    This only really affects the predict function when you pass newdata= because it goes back to see what the original call was.