Search code examples
rstringformulaevalpaste

using reformulate to merge non-standard formula


Given:

vars <- c("var1", "var2", "var3", "var4")
mm_exp <- expression(
  f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE), 
  f(testm2, testmodel = 'fg'),
  f(testm3, testmodel = 'fg3')
)

I want to produce formulas using mm_exp and all combinations (combn) of vars to enter in a model:

#y ~ var1 + var2 + var3 + var4 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')

#y ~ var1 + var2 + var3 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')

#y ~ var1 + var2 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')

#y ~ var1 + var4 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')


#etc.....

If I simplify mm_exp, I can get something similar to what I want using reformulate (ignoring combn for now):

mm_exp_simplify <- expression(
  f(testm, testmodel = 'test', testgraph = g), 
  f(testm2, testmodel = 'fg'),
  f(testm3, testmodel = 'fg3')
)
reformulate(c(vars, sapply(mm_exp_simplify, deparse)), "y")
# y ~ var1 + var2 + var3 + var4 + f(testm, testmodel = "test", 
#     testgraph = g) + f(testm2, testmodel = "fg") + f(testm3, 
#     testmodel = "fg3")

but if I add back in truetest1 = TRUE, truetest2 = TRUE it causes issues:

mm_exp <- expression(
  f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE), 
  f(testm2, testmodel = 'fg'),
  f(testm3, testmodel = 'fg3')
)
reformulate(c(vars, sapply(mm_exp, deparse)), "y")
# Error in reformulate(c(vars, sapply(mm_exp, deparse)), "y") : 
#   'termlabels' must be a character vector of length at least one

I also tried using quote but had a similar problem:

mm_quote <- quote(
  f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE) + 
    f(testm2, testmodel = 'fg') + f(testm3, testmodel = 'fg3')
)
as.formula(paste0("y ~ ", paste(paste(vars, collapse = "+"), deparse(mm_quote), sep = "+")))
# Error in parse(text = x, keep.source = FALSE) : 
#   <text>:2:39: unexpected '='
# 1: y ~ var1+var2+var3+var4+f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
# 2: y ~ var1+var2+var3+var4+    truetest2 =
#                                          ^

Does anyone have suggestions on how to include the truetest1 = TRUE, truetest2 = TRUE and also how to get the combn versions of the formulas?

thanks


Solution

  • Solution

    To solve the first problem, you need to use deparse1 instead of deparse. Like this:

    reformulate(c(vars, sapply(mm_exp, deparse1)), "y")
    #> y ~ var1 + var2 + var3 + var4 + f(testm, testmodel = "test", 
    #>     testgraph = g, truetest1 = TRUE, truetest2 = TRUE) + f(testm2, 
    #>     testmodel = "fg") + f(testm3, testmodel = "fg3")
    

    About your second problem, first you can create all the combinations with all possible lengths, then you can create a list of all formulas in this way:

    # all vars combinations 
    vars_comb <- lapply(seq_along(vars), function(n) combn(vars, n, simplify = FALSE))
    vars_comb <- unlist(vars_comb, recursive = FALSE)
    
    # all formulas
    lapply(vars_comb, function(v) reformulate(c(v, sapply(mm_exp, deparse1)), "y"))
    

    WHY

    The reason behind it is related to the default value of the argument width.cutoff which is width.cutoff = 60L in deparse and width.cutoff = 500L in deparse1.

    Just look at this:

    # output with deparse
    deparse(expression(f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE)))
    #> [1] "expression(f(testm, testmodel = \"test\", testgraph = g, truetest1 = TRUE, "
    #> [2] "    truetest2 = TRUE))"
    
    # output with deparse and width.cutoff forced to 500
    deparse(expression(f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE)), 
            width.cutoff = 500)
    #> [1] "expression(f(testm, testmodel = \"test\", testgraph = g, truetest1 = TRUE, truetest2 = TRUE))"
    
    # output with deparse1
    deparse1(expression(f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE)))
    #> [1] "expression(f(testm, testmodel = \"test\", testgraph = g, truetest1 = TRUE, truetest2 = TRUE))"
    

    The first deparse creates a vector of length 2 which interferes with reformulate because it creates components of the formula that are not complaint.


    FOR R < 4.0

    If you have R 3.6 like you said in the comments, deparse1 is not available. Therefore you need to set width.cutoff = 500L inside deparse.

    The solution will look like this:

    # first issue
    reformulate(c(vars, sapply(mm_exp, deparse, width.cutoff = 500L)), "y")
    
    # second issue
    vars_comb <- lapply(seq_along(vars), function(n) combn(vars, n, simplify = FALSE))
    vars_comb <- unlist(vars_comb, recursive = FALSE)
    lapply(vars_comb, function(v) reformulate(c(v, sapply(mm_exp, deparse, width.cutoff = 500L)), "y"))