Search code examples
rtidyrpurrrrlang

How to pass dataframe columns as argument when curlycurly does not work? (e.g.map function within nest)


I'm very happy that it is relatively easy to pass dataframe columns as function argument with "curly curly" {{}} to e.g. filter() or select(). But this does not work with nest and map (see code below, {{x}} in lm function) and I don't know why. How do you proceed in this case?

I have the following code which I would like to run for several variables (and not only for var1 e.g. as here in the example):

library(tidyverse)
library(broom)

var1 <- runif(100)
var2 <- rnorm(100)
var3 <- rnorm(100,mean=3)
group <- c(rep(1,25),rep(2,25),rep(3,25),rep(4,25))

data_set <- data.frame(var1,var2,var3,group)

nest_fct <- function(x){
  new <- data_set %>%
    filter(is.na({{x}}) != 1) %>% 
    select({{x}}, var2, group ) %>%
    nest(data = -group) %>% 
    mutate(
      fit = map(data,   ~ lm({{x}} ~  var2, data = .)),
      tidied = map(fit, tidy)) %>% 
    unnest(tidied) %>% 
    select(-data, -fit)
}
new <- nest_fct(var1)

I get the error (sorry that it is in German): Fehler: Problem with mutate() input fit. x Variablenlängen sind unterschiedlich (gefunden für 'var2') i Input fit is map(data, ~lm(var1 ~ var2, data = .)).

(I want to run a regression for each group and then save the regression coefficients)

The code runs through without the function "nest_fct":

new <- data_set %>%
  filter(is.na(var1) != 1) %>% 
  select(var1, var2, group ) %>%
  nest(data = -group) %>% 
  mutate(
    fit = map(data,   ~ lm(var1 ~  var2, data = .)),
    tidied = map(fit, tidy)) %>% 
  unnest(tidied) %>% 
  select(-data, -fit)

Solution

  • Here is one method with exexpr/expr

    nest_fct <- function(x){
      new <- data_set %>%
        filter(is.na({{x}}) != 1) %>% 
        select({{x}}, var2, group ) %>%
        nest(data = -group) %>% 
        mutate(
          fit = map(data,   ~ lm(rlang::expr(!! rlang::enexpr(x) ~ var2), data = .)),
          tidied = map(fit, tidy)) %>% 
        unnest(tidied) %>% 
        select(-data, -fit)
    }
    new <- nest_fct(var1)
    
    
    new
    # A tibble: 8 x 6
    #  group term        estimate std.error statistic       p.value
    #  <dbl> <chr>          <dbl>     <dbl>     <dbl>         <dbl>
    #1     1 (Intercept)  0.475      0.0543    8.75   0.00000000896
    #2     1 var2        -0.0250     0.0492   -0.507  0.617        
    #3     2 (Intercept)  0.468      0.0544    8.60   0.0000000122 
    #4     2 var2         0.0617     0.0495    1.25   0.225        
    #5     3 (Intercept)  0.572      0.0616    9.29   0.00000000301
    #6     3 var2         0.00304    0.0559    0.0544 0.957        
    #7     4 (Intercept)  0.476      0.0575    8.29   0.0000000234 
    #8     4 var2         0.180      0.0576    3.13   0.00473