Search code examples
rpurrrgambroombetareg

Exporting summary.betareg and summary.gam from list to csv


I have two types of models (GAM and beta regression) that I am running for my data and I am using a lapply function to iterate over multiple variables. I am encountering an error when trying to export result summaries into a csv. Reproducible example below

library(mgcv)
library(betareg)

data(mtcars)
vars <- names(mtcars[c(2:11)])
vars

models = lapply(vars, function(x) {
  gam(substitute(i ~ s(mpg) , list(i = as.name(x))), 
      method = 'REML',
      data = mtcars)
})
y=lapply(models, summary)
names(y) <- vars # To add names of the model instead of list number
df=purrr::map_df(y, broom::tidy, .id = 'vars') 
write.csv(df, "results/summary.csv")

The map_df and tidy method work for standard linear models. df=purrr::map_df(y, broom::tidy, .id = 'vars') but not for summary.betareg or summary.gam list I have also tried df=as.data.frame(do.call(rbind,y)) which is closer but still not quite right.

Is there a way to extract these summaries easily?


Solution

  • The issue is that broom does not provide a tidy method for objects of class summary.gam whereas for linear models aka models of class lm we have both a tidy.lm and a tidy.summary.lm method. While the implementation of the methods differ slightly, tidy.lm calls summary and you should get the same df from both methods.

    And the same holds for tidy.gam. As with tidy.lm it calls summary under the hood. Hence, there is no need to use summary and you could achieve your desired result by calling tidy on the "raw" model object, i.e. use y <- models:

    library(mgcv)
    #> Loading required package: nlme
    #> This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
    
    
    vars <- names(mtcars[c(2:11)])
    
    models <- lapply(vars, function(x) {
      gam(substitute(i ~ s(mpg), list(i = as.name(x))),
        method = "REML",
        data = mtcars
      )
    })
    y <- models
    names(y) <- vars
    df <- purrr::map_df(y, broom::tidy, .id = "vars")
    
    df
    #> # A tibble: 10 × 6
    #>    vars  term     edf ref.df statistic    p.value
    #>    <chr> <chr>  <dbl>  <dbl>     <dbl>      <dbl>
    #>  1 cyl   s(mpg)  4.57   5.56     34.7  0         
    #>  2 disp  s(mpg)  2.54   3.17     34.4  0         
    #>  3 hp    s(mpg)  3.05   3.80     16.9  0.00000130
    #>  4 drat  s(mpg)  1.00   1.00     26.0  0.0000180 
    #>  5 wt    s(mpg)  2.32   2.91     38.1  0         
    #>  6 qsec  s(mpg)  1.00   1.00      6.37 0.0171    
    #>  7 vs    s(mpg)  1.00   1.00     23.7  0.0000345 
    #>  8 am    s(mpg)  1.00   1.00     16.9  0.000285  
    #>  9 gear  s(mpg)  1.00   1.00      8.99 0.00540   
    #> 10 carb  s(mpg)  1.00   1.00     13.1  0.00109
    

    And just as reference, for a simple linear model we get the following outputs when applying tidy to the raw model object and the output of ´summary()`:

    model <- lm(mpg ~ hp, mtcars)
    
    broom::tidy(model)
    #> # A tibble: 2 × 5
    #>   term        estimate std.error statistic  p.value
    #>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
    #> 1 (Intercept)  30.1       1.63       18.4  6.64e-18
    #> 2 hp           -0.0682    0.0101     -6.74 1.79e- 7
    
    
    broom::tidy(summary(model))
    #> # A tibble: 2 × 5
    #>   term        estimate std.error statistic  p.value
    #>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
    #> 1 (Intercept)  30.1       1.63       18.4  6.64e-18
    #> 2 hp           -0.0682    0.0101     -6.74 1.79e- 7