Search code examples
rformulapaste

paste formula in function (c constant?)


I'm doing a function pasting a formula and then returning a feols result. But, I get a c at the beginning. How can I solve this?

library(dplyr)
library(fixest)

data(base_did)
base_did = base_did %>% mutate(D = 5*rnorm(1080),
                               x2 = 10*rnorm(1080),
                               rand_wei = abs(rnorm(1080)))

f <- function(data, arg=NULL){
  
  arg = enexpr(arg)
  
  if (length(arg) == 0) {
    formula = "D ~ 1"
  } 
  else {
    formula = paste(arg, collapse = " + ")
    formula = paste("D ~ ", formula, sep = "")
  }
  
  formula = paste(formula, " | id + period", sep = "")
  denom.lm <- feols(as.formula(formula), data = data, 
                    weights = abs(data$rand_wei))
  
  return(denom.lm)
}

f(base_did, arg = c(x1,x2))

#Error in feols(as.formula(formula), data = data, weights = abs(data$rand_wei)) : 
#  Evaluation of the right-hand-side of the formula raises an error: 
#  In NULL: Evaluation of .Primitive("c") returns an object of length 1
#while the data set has 1080 rows.

If I return(formula) at the end. I get [1] "D ~ c + x1 + x2 | id + period".

But I need only D ~ x1 + x2 | id + period.


Solution

  • Perhaps one option to make your function work would be to pass the arguments via ... so that c is not needed and which would prevent the c to be added to your formula. To make this work you also have switch to enexprs inside your function.

    Note: I slightly adjusted your function for the reprex to return just the formula.

    library(dplyr, warn = FALSE)
    library(fixest)
    
    data(base_did)
    
    base_did = base_did %>% mutate(D = 5*rnorm(1080),
                                   x2 = 10*rnorm(1080),
                                   rand_wei = abs(rnorm(1080)))
    
    f <- function(data, ...){
      arg = enexprs(...)
      
      if (length(arg) == 0) {
        formula = "D ~ 1"
      } 
      else {
        formula = paste(arg, collapse = " + ")
        formula = paste("D ~ ", formula, sep = "")
      }
      
      formula = paste(formula, " | id + period", sep = "")
      
      as.formula(formula)
    }
    
    
    f(base_did, x1, x2)
    #> D ~ x1 + x2 | id + period
    #> <environment: 0x7fe8f3567618>
    
    f(base_did)
    #> D ~ 1 | id + period
    #> <environment: 0x7fe8f366f848>
    

    UPDATE There is probably a better approach but after some research a possible option would be:

    Note: When passing multiple arguments via c enexpr will return a call object which behaves like a list and where the first element contains the function name, i.e. c. That's why you get the c added to your formula.

    f <- function(data, arg = NULL) {
      arg <- enexpr(arg)
      
      if (length(arg) == 0) {
        formula <- "D ~ 1"
      } else {
        if (length(arg) > 1) arg <- vapply(as.list(arg[-1]), rlang::as_string, FUN.VALUE = character(1))
        
        formula <- paste(arg, collapse = " + ")
        formula <- paste("D ~ ", formula, sep = "")
      }
    
      formula <- paste(formula, " | id + period", sep = "")
    
      as.formula(formula)
    }
    
    
    f(base_did, c(x1, x2))
    #> D ~ x1 + x2 | id + period
    #> <environment: 0x7fa763431388>
    
    f(base_did, x1)
    #> D ~ x1 | id + period
    #> <environment: 0x7fa763538c40>
    
    f(base_did)
    #> D ~ 1 | id + period
    #> <environment: 0x7fa765e22028>