Search code examples
rvariablesdata.tableformula

How to use a variable name in a formula instead of the column itself


I have data for which I would like to make a summary by group using the summary_by function (from the doBy package). I can't use the column names in the summary_by formula but variables I created before.
Below is the result I would like to achieve :

library(data.table)
library(doBy)

mtcars = data.table(mtcars)

doBy::summary_by(data = mtcars, mpg ~ gear + am, FUN = "mean")

output:

gear  am   mpg."mean"
3     0    16.10667
4     0    21.05000
4     1    26.27500
5     1    21.38000

Here is what I want to do :

library(data.table)
library(doBy)

mtcars = data.table(mtcars)

variable1 = "gear" # which is a column name of mtcars
variable2 = "am" # which is a column name of mtcars
variable3 = "mpg" # which is a column name of mtcars

doBy::summary_by(data = mtcars, variable3 ~ variable1 + variable2 , FUN = "mean")

I tried with the functions get, assign, eval, mget but I don't find the solution.


Solution

  • Just provide a string instead of a formula that relies on non-standard evaluation.

    library(data.table)
    library(doBy)
    
    mtcars = data.table(mtcars)
    
    variable1 = "gear" # which is a column name of mtcars
    variable2 = "am" # which is a column name of mtcars
    variable3 = "mpg" # which is a column name of mtcars
    
    doBy::summary_by(data = mtcars, 
                     # alternatively to sprintf(), use paste() oder glue()
                     as.formula(sprintf("%s ~ %s + %s", variable3, variable1, variable2)), 
                     FUN = "mean")