Search code examples
rdataframedplyrtidyversepurrr

Get summary of the model using purrr::map within dplyr piping


Using mtcars data, I am testing map() to build some lm() models:

library(tidyverse)

 mtcars %>%
  group_by(cyl) %>%
  nest()%>%
  mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x)))

#> # A tibble: 3 x 3
#>     cyl data               fit     
#>   <dbl> <list>             <list>  
#> 1     6 <tibble [7 x 10]>  <S3: lm>
#> 2     4 <tibble [11 x 10]> <S3: lm>
#> 3     8 <tibble [14 x 10]> <S3: lm>

The output shows that I have a new column, fit.

Now I wish to see the summary of each lm

When I try:

library(tidyverse)

 mtcars %>%
  group_by(cyl) %>%
  nest()%>%
  mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>%
  map(fit,summary)

#> Error in as_mapper(.f, ...): object 'fit' not found

It gives the error:

Error in as_mapper(.f, ...) : object 'fit' not found

If I wish to calculate R2 or aic then I can using the following code without any problem:

library(tidyverse)
library(modelr)

mtcars %>%
  group_by(cyl) %>%
  nest()%>%
  mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>%
   mutate(r2 = map_dbl(fit, ~rsquare(., data = mtcars)),
         aic = map_dbl(fit, ~AIC(.))) %>% 
  arrange(aic)

#> # A tibble: 3 x 5
#>     cyl data               fit           r2    aic
#>   <dbl> <list>             <list>     <dbl>  <dbl>
#> 1     6 <tibble [7 x 10]>  <S3: lm>  -8.96  -Inf  
#> 2     4 <tibble [11 x 10]> <S3: lm> -26.4     56.4
#> 3     8 <tibble [14 x 10]> <S3: lm>  -1.000   67.3

Created on 2019-06-18 by the reprex package (v0.3.0)

What am I missing?


Solution

  • As IceCreamToucan's comment laid out, purrr::map does not look into the data which has been made within your piping.

    If you use it with dplyr::mutate then it has access to fit which you have created in the previous piping.

    Another option would be explicitly referring to fit which you can see below, as my second suggestion.

    library(tidyverse)
    
    mtcars %>%
      group_by(cyl) %>%
      nest()%>%
      mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>% 
      mutate(fit_sum = map(fit,summary)) 
    #> # A tibble: 3 x 4
    #>     cyl data               fit    fit_sum   
    #>   <dbl> <list>             <list> <list>    
    #> 1     6 <tibble [7 x 10]>  <lm>   <smmry.lm>
    #> 2     4 <tibble [11 x 10]> <lm>   <smmry.lm>
    #> 3     8 <tibble [14 x 10]> <lm>   <smmry.lm>
    
    mtcars %>%
      group_by(cyl) %>%
      nest()%>%
      mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>%
      {map(.$fit, summary)} #or using pull: `pull(fit) %>% map(summary)`
    
    #> [[1]]
    #> 
    #> Call:
    #> lm(formula = mpg ~ ., data = .x)
    #> 
    #> Residuals:
    #> ALL 7 residuals are 0: no residual degrees of freedom!
    #> 
    #> Coefficients: (3 not defined because of singularities)
    #>             Estimate Std. Error t value Pr(>|t|)
    #> (Intercept) 32.78649         NA      NA       NA
    #> disp         0.07456         NA      NA       NA
    #> hp          -0.04252         NA      NA       NA
    #> drat         1.52367         NA      NA       NA
    #> wt           5.12418         NA      NA       NA
    #> qsec        -2.33333         NA      NA       NA
    #> vs          -1.75289         NA      NA       NA
    #> am                NA         NA      NA       NA
    #> gear              NA         NA      NA       NA
    #> carb              NA         NA      NA       NA
    #> 
    #> Residual standard error: NaN on 0 degrees of freedom
    #> Multiple R-squared:      1,  Adjusted R-squared:    NaN 
    #> F-statistic:   NaN on 6 and 0 DF,  p-value: NA
    
    ####truncated the results for the sake of space####
    

    Created on 2019-06-17 by the reprex package (v0.3.0)