Search code examples
rtidyversepurrr

Fitting multiple linear models to all observations in a dataframe using purrr::map


I'm trying to learn more about the functionalities of R's purrr using these exercises.

The task (exercise 8) is the following:

  1. Split the mtcars dataset by cyl.
  2. Fit a linear model (qsec ~ hp for each group of observations by cyl.
  3. Fit each model to all mtcars.

My current code is:

library(tidyverse)
mtcars %>% 
    split(mtcars$cyl) %>% 
    map(~ lm(.x$qsec ~ .x$hp)) %>% 
    map(~ predict(.x, newdata = list(mtcars)))

However, this only applies one model on one group of mtcars, so that the output is:

$`4`
       1        2        3        4        5        6        7        8 
18.98872 19.43308 18.96005 19.37574 19.57642 19.39008 18.93138 19.37574 
       9       10       11 
19.01739 18.70203 18.75937 

$`6`
       1        2        3        4        5        6        7 
18.51998 18.51998 18.51998 18.74090 17.94558 17.94558 15.64799 

$`8`
       1        2        3        4        5        6        7        8 
17.37861 16.13783 17.28998 17.28998 17.28998 16.84684 16.66959 16.40371 
       9       10       11       12       13       14 
17.82174 17.82174 16.13783 17.37861 15.80104 14.54254 

The desired output, as I understand it, would be a list of predicted values with three list elements of 32 values each. How would I need to revise the code? Thank you.


Solution

  • The issue is that you passed the vectors directly into the formula inside lm(). Instead pass the dataset to the data= argument:

    library(tidyverse)
    #> Warning: package 'ggplot2' was built under R version 4.3.1
    
    pred <- mtcars %>%
      split(mtcars$cyl) %>%
      map(~ lm(qsec ~ hp, data = .x)) %>%
      map(~ predict(.x, newdata = mtcars))
    
    lapply(pred, length)
    #> $`4`
    #> [1] 32
    #> 
    #> $`6`
    #> [1] 32
    #> 
    #> $`8`
    #> [1] 32