Search code examples
rdplyrevaluation

Passing an argument to a regression model inside a function that uses dplyr R


I wrote a function to run a univariable regression on a filtered data set. The function takes as arguments a value used for filtering and the name of the predictor for the regression model. As you can see, I am struggling with data masking and evaluation. How do I use the .pred argument directly in the regression model? Thanks!

pacman::p_load(tidyverse, purrr, broom)
data("mtcars")

# my function
regr_func <- function(.cyl, .pred){
  
  mtcars %>% 
    filter(cyl == .cyl) %>%  # cars with .cyl cylinders
    mutate(x = .data[[.pred]]) %>%  # this is a bit of a hack :(
    lm(mpg ~ x, data = .) %>% 
    tidy() %>% 
    mutate(predictor = .pred,
           cylinders = .cyl)
}

regr_func(4, "hp")
#> # A tibble: 2 × 7
#>   term        estimate std.error statistic   p.value predictor cylinders
#>   <chr>          <dbl>     <dbl>     <dbl>     <dbl> <chr>         <dbl>
#> 1 (Intercept)   36.0      5.20        6.92 0.0000693 hp                4
#> 2 x             -0.113    0.0612     -1.84 0.0984    hp                4
Created on 2021-10-26 by the reprex package (v2.0.1)

Update

Thanks to Jon's tip, I could rewrite the function to pass the .pred argument directly to lm(), but now I can't pipe the data into lm(), so I had to create a new data set inside the function.

regr_func1 <- function(.cyl, .pred){
  
  tmp <- mtcars %>% filter(cyl == .cyl)
  
  xsym <- rlang::ensym(.pred)
  rlang::inject( lm(mpg ~ !!xsym, data = tmp) ) %>% 
    tidy() %>% 
    mutate(cylinders = .cyl)
}

Solution

  • You can create the formula on fly using as.formula or reformulate without breaking the pipe.

    library(dplyr)
    library(broom)
    
    regr_func <- function(.cyl, .pred){
      
      mtcars %>% 
        filter(cyl == .cyl) %>%  
        lm(reformulate(.pred, 'mpg'), data = .) %>% 
        tidy() %>% 
        mutate(predictor = .pred,
               cylinders = .cyl)
    }
    regr_func(4, "hp")
    
    #  term        estimate std.error statistic   p.value predictor cylinders
    #  <chr>          <dbl>     <dbl>     <dbl>     <dbl> <chr>         <dbl>
    #1 (Intercept)   36.0      5.20        6.92 0.0000693 hp                4
    #2 hp            -0.113    0.0612     -1.84 0.0984    hp                4