Search code examples
rdplyrrlangtidyeval

Programmatic regression modelling with tidyeval


I am trying to get my head around programming using tidyeval.

I want to write a function to run logistic regression models for selected outcome variables:

library(tidyverse)
set.seed(1234)

df <- tibble(id = 1:1000,
             group = sample(c("Group 1", "Group 2", "Group 3"), 1000, replace = TRUE),
             died = sample(c(0,1), 1000, replace = TRUE))

myfunc <- function(data, outcome){

enquo_var <- enquo(outcome)

fit <- tidy(glm(!!enquo_var ~ group, data=data, 
                family = binomial(link = "logit")), 
                exponentiate = TRUE, conf.int=TRUE)

fit
}


myfunc(df, died)

But get:

Error in !enquo_outcome : invalid argument type

(note real scenario involves more complex function).

Is this possible?


Solution

  • We need to create a formula for glm to pick it up. One option is paste

    myfunc <- function(data, outcome){
      enquo_var <- enquo(outcome)
       fit <- tidy(glm(paste(quo_name(enquo_var), "group", sep="~"), data=data, 
                    family = binomial(link = "logit")), 
                    exponentiate = TRUE, conf.int=TRUE)
    
    fit
    }
    
    myfunc(df, died)
    #         term  estimate std.error  statistic    p.value  conf.low conf.high
    #1  (Intercept) 0.8715084 0.1095300 -1.2556359 0.20924801 0.7026185  1.079852
    #2 groupGroup 2 0.9253515 0.1550473 -0.5003736 0.61681204 0.6826512  1.253959
    #3 groupGroup 3 1.3692735 0.1557241  2.0181864 0.04357185 1.0095739  1.859403
    

    If we also need to use the tidyverse functions

    myfunc <- function(data, outcome){
    
      quo_var <- quo_name(enquo(outcome))
    
       fit <- tidy(glm(rlang::expr(!! rlang::sym(quo_var) ~ group), data=data, 
                family = binomial(link = "logit")), 
                exponentiate = TRUE, conf.int=TRUE)
    
     fit
    }
    
    myfunc(df, died)
    #           term  estimate std.error  statistic    p.value  conf.low conf.high
    #1  (Intercept) 0.8715084 0.1095300 -1.2556359 0.20924801 0.7026185  1.079852
    #2 groupGroup 2 0.9253515 0.1550473 -0.5003736 0.61681204 0.6826512  1.253959
    #3 groupGroup 3 1.3692735 0.1557241  2.0181864 0.04357185 1.0095739  1.859403
    

    Or as @lionel mentioned in the comments get_expr can be used

    myfunc <- function(data, outcome){
    
      quo_var <- enquo(outcome)
    
       fit <- tidy(glm(rlang::expr(!! rlang::get_expr(quo_var) ~ group), data=data, 
                family = binomial(link = "logit")), 
                exponentiate = TRUE, conf.int=TRUE)
    
     fit
    }
    
    myfunc(df, died)
    #         term  estimate std.error  statistic    p.value  conf.low conf.high
    #1  (Intercept) 0.8715084 0.1095300 -1.2556359 0.20924801 0.7026185  1.079852
    #2 groupGroup 2 0.9253515 0.1550473 -0.5003736 0.61681204 0.6826512  1.253959
    #3 groupGroup 3 1.3692735 0.1557241  2.0181864 0.04357185 1.0095739  1.859403
    

    Or a more compact approach suggested by @lionel which avoids the enquo/quo_name/sym conversion instead directly takes the argument in enexpr

     myfunc <- function(data, outcome){
    
    
    
       fit <- tidy(glm(rlang::expr(!! rlang::enexpr(outcome) ~ group), data=data, 
                family = binomial(link = "logit")), 
                exponentiate = TRUE, conf.int=TRUE)
    
     fit
    }
    
    myfunc(df, died)
    #         term  estimate std.error  statistic    p.value  conf.low conf.high
    #1  (Intercept) 0.8715084 0.1095300 -1.2556359 0.20924801 0.7026185  1.079852
    #2 groupGroup 2 0.9253515 0.1550473 -0.5003736 0.61681204 0.6826512  1.253959
    #3 groupGroup 3 1.3692735 0.1557241  2.0181864 0.04357185 1.0095739  1.859403