Search code examples
rkableextramodelsummary

modelsummary/kableExtra regression table with models of the same name


I use modelsummary() with kableExtra() to generate a regression table in an Rmd file (final output format: LaTex and HTML).

I run regressions for several variable combinations and model specifications. The regressions are grouped in the table by variable combinations via kable::add_header_above().

For different variable combinations, I run the same models (e.g. OLS & Poisson, or other). To improve readability I would, therefore, like to name the models simply as such, e.g.

names(models) <- c("OLS", "Poisson", "OLS", "Poisson", ...)

instead of

names(models) <- c("OLS 1", "Poisson 1", "OLS 2", "Poisson 2", ...)

However, modelsummary() somehow does not permit the regressions to be named the same, resulting in the following errors:

Error: Can't bind data because some arguments have the same name
Backtrace:
  1. modelsummary::msummary(...)
  2. modelsummary::extract(...)
 10. dplyr::mutate(., group = "gof")
 12. dplyr:::mutate_cols(.data, ...)
 13. DataMask$new(.data, caller_env())
 14. .subset2(public_bind_env, "initialize")(...)
 17. rlang::env_bind_lazy(...)
 18. rlang:::env_bind_impl(.env, exprs, "env_bind_lazy()", TRUE, binder)

and

 Error in htmlTable_add_header_above(kable_input, header, bold, italic,  : 
 The new header row you provided has a different total number of columns with the original `kabel()` output.

MWE:

library(modelsummary)
library(kableExtra)

url <- 'https://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv'
dat <- read.csv(url)

models <- list()
models[['OLS']] <- lm(Crime_prop ~ Literacy, data = dat)
models[['Poisson']] <- glm(Crime_prop ~ Literacy + Clergy, family = poisson, data = dat)
models[['OLS']] <- lm(Crime_pers ~ Literacy, data = dat)
models[['Poisson']] <- glm(Crime_pers ~ Literacy + Clergy, family = poisson, data = dat)

# build table with `modelsummary` 
cm <- c( '(Intercept)' = 'Constant', 'Literacy' = 'Literacy (%)', 'Clergy' = 'Priests/capita')
cap <- 'A modelsummary table customized with kableExtra'

tab <- msummary(models, output = 'kableExtra',
                coef_map = cm, stars = TRUE,
                title = cap, gof_omit = 'IC|Log|Adj')

# customize table with `kableExtra`
tab %>%
  
  # column labels
  add_header_above(c(" " = 1, "Crimes (property)" = 2, "Crimes (person)" = 2))

AddOn:

One workaround is to add a space " " to the model name, prior to building the table with modelsummary:

names(models) <- c("OLS", "Poisson", "OLS ", "Poisson ", ...)

Manually this is easily feasible for few model specifications and variable combinations. However, a solution that could dynamically adapt to the given settings would be preferred, i.e. to suit also cases as the following:

names(models) <- c("OLS", "Poisson", "GLM", "Poisson", ...)

instead of

names(models) <- c("OLS 1", "Poisson 1", "GLM 2", "Poisson 2", ...)

UPDATE:

With the updated package version made available by @Vincent, regression tables with models of the same name can easily be implemented also for models stored in nested lists, e.g. if they are added to sublists in a loop or via lapply(..., FUN).

models <- NA
models <- list()
models[["a"]][["OLS"]] <- lm(Crime_prop ~ Literacy, data = dat)
models[["a"]][["Poisson"]] <- glm(Crime_prop ~ Literacy + Clergy, family = poisson, data = dat)
models[["b"]][["OLS"]] <- lm(Crime_pers ~ Literacy, data = dat)
models[["b"]][["Poisson"]] <- glm(Crime_pers ~ Literacy + Clergy, family = poisson, data = day)
# ...

models_unlisted <- unlist(models, recursive=FALSE)
names(models_unlisted) <- c('ols', 'poisson', 'ols', 'poisson')

cm <- c( '(Intercept)' = 'Constant', 'Literacy' = 'Literacy (%)', 'Clergy' = 'Priests/capita')

msummary(models_unlisted, output = 'kableExtra', statistic_vertical = FALSE,
         coef_map = cm, stars = TRUE, gof_omit = 'IC|Log|Adj') %>%
  add_header_above(c(" " = 1, "Crimes (property)" = 2, "Crimes (person)" = 2))

modelsummary regression table output


Solution

  • Thanks for the question. The other poster is right: your solution under MWE will never work because it relates to a fundamental feature of the R language. Assigning to the same name in a list overwrites the previous value. See:

    a <- list()
    a['blah'] <- 1
    a['blah'] <- 2
    a
    

    The easiest trick I know is the one already proposed: add a space after names. This has one main disadvantage: it makes it harder to use select columns by names to customize them with gt or kableExtra. But aside from that it is quite innocuous, since all table-making packages strip out the white space before displaying the table.

    After reading your question, I added a line of code to modelsummary to "pad" model names automatically. If you install from Github (I'll release to CRAN soon), you should be able to run this:

    library(remotes)
    install_github('vincentarelbundock/modelsummary')
    
    library(modelsummary)
    library(kableExtra)
    
    url <- 'https://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv'
    dat <- read.csv(url)
    
    models <- list()
    models[[1]] <- lm(Crime_prop ~ Literacy, data = dat)
    models[[2]] <- glm(Crime_prop ~ Literacy + Clergy, family = poisson, data = dat)
    models[[3]] <- lm(Crime_pers ~ Literacy, data = dat)
    models[[4]] <- glm(Crime_pers ~ Literacy + Clergy, family = poisson, data = dat)
    names(models) <- c('ols', 'poisson', 'ols', 'poisson')
    
    cm <- c( '(Intercept)' = 'Constant', 'Literacy' = 'Literacy (%)', 'Clergy' = 'Priests/capita')
    cap <- 'A modelsummary table customized with kableExtra'
    
    msummary(models, output = 'kableExtra',
             coef_map = cm, stars = TRUE,
             title = cap, gof_omit = 'IC|Log|Adj') %>%
           add_header_above(c(" " = 1, "Crimes (property)" = 2, "Crimes (person)" = 2))
    

    PS: please open an issue on Github if you have feature requests: https://github.com/vincentarelbundock/modelsummary/issues