Search code examples
rrlang

How to create paired formula for t-test using {rlang}?


I'm trying to create a unified interface for running a t-test using {rlang}, and I'd like to use the formula method for t.test(). While doing so, I can construct a formula using {rlang} for unpaired t-test, but can't figure out how to create the formula for paired test.

library(rlang)

two_sample_t_test <- function(data, x, y, paired, ...) {
  x <- ensym(x)
  y <- ensym(y)

  exec(
    t.test,
    formula = if (paired) {
      new_formula(quote(Pair(x, y)), 1)
    } else {
      new_formula(y, x)
    },
    data = data,
    ...
  )
}

# unpaired t-test -------------------------

two_sample_t_test(mtcars, am, wt, paired = FALSE, var.equal = TRUE)
#> 
#>  Two Sample t-test
#> 
#> data:  wt by am
#> t = 5.2576, df = 30, p-value = 1.125e-05
#> alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
#> 95 percent confidence interval:
#>  0.8304317 1.8853577
#> sample estimates:
#> mean in group 0 mean in group 1 
#>        3.768895        2.411000

# paired t-test -------------------------

two_sample_t_test(mtcars, am, wt, paired = TRUE)

#> Error in model.frame.default(formula = .Primitive("quote")(Pair(x, y) ~ : invalid type (list) for variable 'Pair(x, y)'

Created on 2023-09-11 with reprex v2.0.2


Solution

  • In {rlang} we can use expr() to capture the expression inside new_formula and !! to evaluate x and y early, which should give us the desired result (the output is much nicer than my old answer using eval(bquote(...)), see below):

    library(rlang)
    
    two_sample_t_test <- function(data, x, y, paired, ...) {
    
      x <- ensym(x)
      y <- ensym(y)
      
      exec(
        t.test,
        formula = if (paired) {
          new_formula(expr(Pair(!! x, !! y)), 1)
        } else {
          new_formula(y, x)
        },
        data = data,
        ...
      )
    }
    
    two_sample_t_test(mtcars, am, wt, paired = TRUE)
    #> 
    #>  Paired t-test
    #> 
    #> data:  Pair(am, wt)
    #> t = -11.589, df = 31, p-value = 8.457e-13
    #> alternative hypothesis: true mean difference is not equal to 0
    #> 95 percent confidence interval:
    #>  -3.305685 -2.316315
    #> sample estimates:
    #> mean difference 
    #>          -2.811
    
    two_sample_t_test(mtcars, am, wt, paired = FALSE, var.equal = TRUE)
    #> 
    #>  Two Sample t-test
    #> 
    #> data:  wt by am
    #> t = 5.2576, df = 30, p-value = 1.125e-05
    #> alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
    #> 95 percent confidence interval:
    #>  0.8304317 1.8853577
    #> sample estimates:
    #> mean in group 0 mean in group 1 
    #>        3.768895        2.411000
    

    Old answer

    I'd try to use eval(bquote(...)) instead of quote() alone, then evaluate x and y early by wrapping them in .() and use the data.frame supplied in the data arg as envir for eval.

    library(rlang)
    
    two_sample_t_test <- function(data, x, y, paired, ...) {
    
      x <- ensym(x)
      y <- ensym(y)
      
      exec(
        t.test,
        formula = if (paired) {
          new_formula(eval(bquote(Pair(.(x), .(y))), envir = data), 1)
        } else {
          new_formula(y, x)
        },
        data = data,
        ...
      )
    }
    
    two_sample_t_test(mtcars, am, wt, paired = FALSE, var.equal = TRUE)
    #> 
    #>  Two Sample t-test
    #> 
    #> data:  wt by am
    #> t = 5.2576, df = 30, p-value = 1.125e-05
    #> alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
    #> 95 percent confidence interval:
    #>  0.8304317 1.8853577
    #> sample estimates:
    #> mean in group 0 mean in group 1 
    #>        3.768895        2.411000
    
    two_sample_t_test(mtcars, am, wt, paired = TRUE)
    #> 
    #>  Paired t-test
    #> 
    #> data:  structure(c(1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 1.513, 3.17, 2.77, 3.57, 2.78), dim = c(32L, 2L), dimnames = list(NULL, c("x", "y")), class = "Pair")
    #> t = -11.589, df = 31, p-value = 8.457e-13
    #> alternative hypothesis: true mean difference is not equal to 0
    #> 95 percent confidence interval:
    #>  -3.305685 -2.316315
    #> sample estimates:
    #> mean difference 
    #>          -2.811
    

    Created on 2023-09-11 with reprex v2.0.2