Search code examples
rdplyrregressiontidymodelsbroom

R broom: show how many observations are included in the model


I have from https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html

regressions <- mtcars %>%
  nest(data = -am) %>% 
  mutate(
    fit = map(data, ~ lm(wt ~ mpg + qsec + gear, data = .x)),
    tidied = map(fit, tidy),
    glanced = map(fit, glance),
    augmented = map(fit, augment)
  )

regressions %>% 
  unnest(tidied)

now because mtcars does not have missing values all models build for the different values of am have the same number of observations. However, if mtcars had missing values for the various variables, each model would had different number of observations (observations deleted due to missingness).

Is it possible to include in the final tibble the number of observations that were used in each model? Neither tidy nor glance nor augment provide this feature which I find really important when doing model fitting.

As Kieran suggested glance provides nobs, however how can I include nobs in the tidy output


Solution

  • It's already there, isn't it? glance includes the number of observations as the nobs column.

    Edit: To get just the nobs column from the glanced list column, use hoist() and name the new column.

    library(tidyverse)
    library(broom)
    mtcars_na <- mtcars
    mtcars_na[1:2,1] <- NA
    head(mtcars_na)
    #>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    #> Mazda RX4           NA   6  160 110 3.90 2.620 16.46  0  1    4    4
    #> Mazda RX4 Wag       NA   6  160 110 3.90 2.875 17.02  0  1    4    4
    #> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    #> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    #> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    #> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    
    regressions <- mtcars %>%
      nest(data = -am) %>% 
      mutate(
        fit = map(data, ~ lm(wt ~ mpg + qsec + gear, data = .x)),
        tidied = map(fit, tidy),
        glanced = map(fit, glance),
        augmented = map(fit, augment)
      )
    
    regressions %>% 
      unnest(tidied) %>% 
      hoist(glanced, nobs = "nobs")
    #> # A tibble: 8 × 11
    #>      am data     fit   term  estimate std.error statistic p.value  nobs glanced 
    #>   <dbl> <list>   <lis> <chr>    <dbl>     <dbl>     <dbl>   <dbl> <int> <list>  
    #> 1     1 <tibble> <lm>  (Int…   4.28      3.46      1.24   2.47e-1    13 <tibble>
    #> 2     1 <tibble> <lm>  mpg    -0.101     0.0294   -3.43   7.50e-3    13 <tibble>
    #> 3     1 <tibble> <lm>  qsec    0.0398    0.151     0.264  7.98e-1    13 <tibble>
    #> 4     1 <tibble> <lm>  gear   -0.0229    0.349    -0.0656 9.49e-1    13 <tibble>
    #> 5     0 <tibble> <lm>  (Int…   4.92      1.40      3.52   3.09e-3    19 <tibble>
    #> 6     0 <tibble> <lm>  mpg    -0.192     0.0443   -4.33   5.91e-4    19 <tibble>
    #> 7     0 <tibble> <lm>  qsec    0.0919    0.0983    0.935  3.65e-1    19 <tibble>
    #> 8     0 <tibble> <lm>  gear    0.147     0.368     0.398  6.96e-1    19 <tibble>
    #> # … with 1 more variable: augmented <list>