Search code examples
revaltidyeval

Evaluating a maximum likelihood expression using datamasks in R


I am trying to evaluate a maximum likelihood expression using data masks. The idea is to allow parameters and variables to be called by name within the function while avoiding multiple calls to attach() and detach(). This is a small very simplified example and the real functions are quite large and complex.

set.seed(1)

# Data
db <- data.frame(
  x = runif(10),
  y = runif(10),
  z = sample(c(0, 1), 10, replace = TRUE)
)

# Log likelihood function
ll_lik <- function(param) {
  pr_1 <- 1 / (1 + exp(-(param[1]*x - param[2]*y)))
  pr_2 <- 1 - pr_1
  lik <- z * pr_1 + (1 - z) * pr_2
  log(lik)
}

# Parameters
param <- c(p1 = 0.1, p2 = 0.2)

# Run the model with attach()/detach()
attach(db)
model <- maxLik::maxLik(ll_lik, start = param)
detach(db)
summary(model)

This works fine, however, I have to make calls to attach() and detach(). To access the parameters by name, I would need to transform the param to a list inside the log-likelihood function and then make calls to attach()/detach(). Not only is this messy, but with large functions and data, it creates unneccessary overhead. One possibility that I have been looking at is to use the rlang package and the wrapper functions mostly written for tidy evaluation of expressions.

Now, just creating the datamask and trying to evaluate the log-likelihood function does not work:

mask <- as_data_mask(db)
eval_tidy(quo(maxLik::maxLik(ll_lik, start = param)), mask)

It cannot access the objects in the datamask (Error in fnOrig(theta, ...) : object 'x' not found). Maybe the issue was with maxLik, but I couldn't even evaluate ll_lik() and this gave the same error:

eval_tidy(quo(ll_lik(param)), mask)

But this works:

eval_tidy(quo(x*3), mask)

So, I began to suspect that ll_lik() has the "wrong" parent, which is why my datamask might not be in the search path for the function, hence it cannot find the variables. Now, the help section for as_data_mask() does provide some examples of how to "nest" environments by creating top, middle and bottom level environments. Ok, so let's see if I can create my function as part of the data mask structure:

call_stack <- function() {lobstr::cst()}

# Create a new environment (child of empty) that takes a list of objects to populate it
top <- new_environment(list(ll_lik = ll_lik, call_stack = call_stack))

# Create a child of the "top" environment"
middle <- env(top)

# Create a child of the "middle environment and add the data object to it
bottom <- env(middle, db=db)

# Create a data_mask where the bottom contains the masking elements and the top
# the last element of the data_mask.
new_mask <- new_data_mask(bottom, top = top)

Unfortunately, I am still unable to access x. I didn't even attemtp the maxLik function here. So to try and dig deeper, I started messing around with the call stack.

eval_tidy(call_stack(), data = new_mask)

And indeed, if I read this correctly, the parent of the function is the global environment.

    █
 1. ├─rlang::eval_tidy(call_stack(), data = new_mask)
 2. └─global::call_stack()
 3.   └─lobstr::cst()

I am, however, at a loss for how to make this work. Any help is much appreciated.

BONOUS: If I am able to call the parameters by name inside maxLik without calls to attach()/detach(), that would be awesome.


Solution

  • One option is to create a wrapper that evaluates the body of ll_lik as an expression, with db as the context:

    llwrap <- function(param) {
      eval( body(ll_lik), db )
    }
    
    model <- maxLik::maxLik(llwrap, start=param)      # Works
    

    EDIT to address your question: yes, body() returns an expression, so you can use whatever names you want inside that expression, as long as you provide the appropriate context at evaluation. However, if you are completely decoupling the body of your function from its argument list, why not just define it as an expression from the beginning?

    ll_expr <- rlang::expr({                       # An expression, not a function
      pr_1 <- 1 / (1 + exp(-(p1*x - p2*y)))        # <-- now using p1, p2
      pr_2 <- 1 - pr_1
      lik <- z * pr_1 + (1 - z) * pr_2
      log(lik)
    })
    
    llwrap2 <- function(param) {
      ctx <- c( as.list(db), as.list(param) )      # Combine param and db into one context
      eval( ll_expr, ctx )                         # No longer need body()
    }
    
    model <- maxLik::maxLik(llwrap2, start=param)  # Works