Search code examples
rtidyversepurrr

fit an `lm` model for every level of a factor


I am trying to write a function that iterates (or uses purrr::map()) through every level of a factor, and fits an lm() model for the subset of the data where the factor is equal to that level.

To make a simple reproducable example with mtcars, just say that I'd like a different lm model for each value of mtcars$gear. I'll start by making it a factor, because my real problem involves iteration through a factor:

library(tidyverse)

mtcars <- mtcars %>% 
  mutate(factor_gear = factor(gear))

I'd like the function to fit every level of factor_gear. The levels are given by:

levels(mtcars$factor_gear)

i.e.

  [1] "3" "4" "5"

So the output I would be looking for would be:

fit1 <- lm(mpg ~ cyl, data = mtcars %>% filter(factor_gear=="3"))
fit2 <- lm(mpg ~ cyl, data = mtcars %>% filter(factor_gear=="4"))
fit3 <- lm(mpg ~ cyl, data = mtcars %>% filter(factor_gear=="5"))

fits <- list(fit1, fit2, fit3)

I've made a start on the function, but wasn't able to get it to work.

I thought that a function should:

  1. get every level of of the factor into a vector
  2. run an lm model for each level.
fit_each_level <- function(factor_variable) {

  # trying to: 1. get every level of of the factor into a vector
  factor_levels <- levels(df_cars$factor_variable)

  # trying to: 2. run an lm model for each level.
  for i in factor_levels {
    fit <- mtcars %>% filter(factor_variable==i [# every value of segment_levels]) %>% 
    lm(mpg ~ cyl, data = . )
  }

}

fit_each_level(factor_gear)

If the function worked well, I'd ultimately be able to do do it on another factor, eg:

mtcars <- mtcars %>% 
  mutate(factor_carb = factor(carb))

fit_each_level(factor_carb)

Solution

  • For dplyr 1.1.0 and above the newer syntax would be -

    library(dplyr)
    
    mtcars %>%
      summarise(model = list(lm(mpg ~ cyl, data = pick(everything()))), 
               .by = factor_gear)
    

    You can nest the dataframe and use map to apply lm for each factor_gear.

    library(dplyr)
    
    mtcars %>%
      group_by(factor_gear) %>%
      tidyr::nest() %>%
      mutate(model = map(data, ~lm(mpg ~ cyl, data = .x)))
    
    #  factor_gear data               model 
    #  <fct>       <list>             <list>
    #1 4           <tibble [12 × 11]> <lm>  
    #2 3           <tibble [15 × 11]> <lm>  
    #3 5           <tibble [5 × 11]>  <lm>  
    

    In the new dplyr you can use cur_data to refer to current data in group which avoids the need of nest and map.

    mtcars %>%
      group_by(factor_gear) %>%
      summarise(model = list(lm(mpg ~ cyl, data = cur_data())))