Search code examples
rtidyverselinear-regressionstep

Using step function in R part of the tidyverse workflow, in particular, using the nest and map functions


I am trying to fit nested models over which I would like to apply the step function and get a subset of potential predictors for each nested element.

I am currently running into the following error specifically when running the step function:

Error: Problem with mutate() input step_null_to_full. x cannot coerce class ‘"lm"’ to a data.frame i Input step_null_to_full is map2(...). i The error occurred in group 1: ID = 1.

Here is a generated code example:

library(tidyverse)

data <- data.frame(ID = rep(1:10,30),
                   Year = rep(1981:2010,10),
                   x1 = rnorm(300,0,1),
                   x2 = rnorm(300,0,1),
                   x3 = rnorm(300,0,1),
                   x4 = rnorm(300,0,1),
                   x5 = rnorm(300,0,1),
                   x6 = rnorm(300,0,1),
                   Y =  rnorm(300,0,1))


model_pipe <- data %>% 
  group_by(ID) %>% 
  nest() %>% 
  mutate(mod_intercept = map(data,~ lm(Y ~ 1, data=.)),
         mod_full = map(data,~ lm(Y ~ ., data=.)),
         mod_full_int = map(data,~ lm(Y ~ .*., data=.)),
         step_null_to_full = map2(mod_intercept,mod_full, 
                                  ~ step(.x, scope = formula(.y), direction = 'forward',trace = 0)))


Solution

  • To make this work you need to pass data in the step function with the same name as you used to fit model. While fitting the model we used data = . so rename data to . and apply the function.

    library(tidyverse)
    
    data %>% 
      group_by(ID) %>% 
      nest() %>% 
      mutate(mod_intercept = map(data,~ lm(Y ~ 1, data=.)),
             mod_full = map(data,~ lm(Y ~ ., data=.)),
             mod_full_int = map(data,~ lm(Y ~ .*., data=.)),
             step_null_to_full = pmap(list(mod_intercept,mod_full, data), 
                                      function(x, y, z) {
                   `.` <- z
                   step(x, scope = formula(y), direction = 'forward',trace = 0)
              })) -> result
    result
    
    #     ID data              mod_intercept mod_full mod_full_int step_null_to_full
    #   <int> <list>            <list>        <list>   <list>       <list>           
    # 1     1 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 2     2 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 3     3 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 4     4 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 5     5 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 6     6 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 7     7 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 8     8 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    # 9     9 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>             
    #10    10 <tibble [30 × 8]> <lm>          <lm>     <lm>         <lm>