Search code examples
rlatexstargazer

Add line with datasets at top of table (stargazer R package)


With the code below I manage to produce the first table below. But...

swiss2 <- swiss[1:20,]

m1 <- lm(Fertility ~ Agriculture, data = swiss)
m2 <- lm(Fertility ~ Examination, data = swiss)
m3 <- lm(Infant.Mortality ~ Education, data = swiss)
m4 <- lm(Infant.Mortality ~ Catholic, data = swiss)
m5 <- lm(Fertility ~ Agriculture, data = swiss2)
m6 <- lm(Fertility ~ Examination, data = swiss2)
m7 <- lm(Infant.Mortality ~ Education, data = swiss2)
m8 <- lm(Infant.Mortality ~ Catholic, data = swiss2)

stargazer(m1, m2, m3, m4, m5, m6, m7, m8,
          type = "latex",
          out="./table.tex",
          omit.stat=c("LL","ser","f","adj.rsq"), 
          font.size="tiny", 
          column.labels = c("(M1)", "(M2)", "(M3)", "(M4)", "(M5)", "(M6)", "(M7)", "(M8)"), 
          model.names = FALSE,
          model.numbers = FALSE,
          star.cutoffs = c(0.05, 0.01, 0.001),
          dep.var.labels = c("Outcome 1", "Outcome 2", "Outcome 1", "Outcome 2"))

enter image description here

...I would rather like to produce this table below. The only difference is that there is a row instead of the row "Dependent variable:" with two columns that indicate the relevant datasets Swiss and Swiss2. I can do this manually in Latex but I need/want a direct hack in R so that my study is fully reproducible from the Rmarkdown file. Ideas anyone? Thanks!

enter image description here


Solution

  • In the stargazer() function, the row you refer to is governed by the dep.var.caption option. Unfortunately, since you want more than one column in this row, you can't accomplish what you want without some tinkering; if you pass a vector of length > 1 to this option, stargazer() will throw an error. So, we'll have to make a custom function that captures the output from stargazer() and modifies it accordingly before printing it.

    The following .Rmd file worked fine for me (output below the code):

    ---
    title: "Stack Overflow Answer"
    author: "duckmayr"
    date: "November 3, 2017"
    output: pdf_document
    ---
    
    ```{r setup, include=FALSE}
    knitr::opts_chunk$set(echo = TRUE)
    ```
    
    ```{r, echo=FALSE}
    custom_table <- function(dataset_labels, ...) {
        tbl <- capture.output(stargazer::stargazer(...))
        pattern1 <- 'Dependent variable:'
        pattern2 <- '(?<= \\& ).+(?= \\\\)'
        first_row_index <- which(grepl(pattern=pattern1, x=tbl))
        first_row <- tbl[first_row_index]
        colspan <- as.numeric(gsub(pattern='[^0-9]+', replacement='', first_row))
        colspan <- colspan / length(dataset_labels)
        new_first_row <- sub('[0-9]+', colspan, first_row)
        replacement <- rep(stringr::str_extract(new_first_row, pattern2), 2)
        replacement <- stringr::str_replace(replacement, pattern1, dataset_labels)
        replacement <- paste(replacement, collapse=' & ')
        new_first_row <- stringr::str_replace(new_first_row, pattern2, replacement)
        new_first_row <- stringr::str_replace_all(new_first_row, 'multi', '\\\\multi')
        new_first_row <- stringr::str_replace_all(new_first_row, 'textit', '\\\\textit')
        tbl[first_row_index] <- new_first_row
        cat(tbl, sep='\n')
    }
    swiss2 <- swiss[1:20,]
    m1 <- lm(Fertility ~ Agriculture, data = swiss)
    m2 <- lm(Fertility ~ Examination, data = swiss)
    m3 <- lm(Infant.Mortality ~ Education, data = swiss)
    m4 <- lm(Infant.Mortality ~ Catholic, data = swiss)
    m5 <- lm(Fertility ~ Agriculture, data = swiss2)
    m6 <- lm(Fertility ~ Examination, data = swiss2)
    m7 <- lm(Infant.Mortality ~ Education, data = swiss2)
    m8 <- lm(Infant.Mortality ~ Catholic, data = swiss2)
    ```
    
    ```{r, echo=FALSE, results='asis'}
    custom_table(c('Data: Swiss', 'Data: Swiss2'),
                 m1, m2, m3, m4, m5, m6, m7, m8,
                 type = "latex",
                 header=FALSE,
                 omit.stat=c("LL","ser","f","adj.rsq"), 
                 font.size="tiny", 
                 column.labels = c("(M1)", "(M2)", "(M3)", "(M4)", "(M5)", "(M6)", "(M7)", "(M8)"), 
                 model.names = FALSE,
                 model.numbers = FALSE,
                 star.cutoffs = c(0.05, 0.01, 0.001),
                 dep.var.labels = c("Outcome 1", "Outcome 2", "Outcome 1", "Outcome 2"))
    ```
    

    enter image description here

    EDIT:

    If you'd prefer the output to go to a tex file rather than (or in addition to) directly using the output in the .Rmd file, we can make the following tweak:

    custom_table <- function(dataset_labels, ..., cat_output=TRUE, out_file=NULL) {
        tbl <- capture.output(stargazer::stargazer(...))
        pattern1 <- 'Dependent variable:'
        pattern2 <- '(?<= \\& ).+(?= \\\\)'
        first_row_index <- which(grepl(pattern=pattern1, x=tbl))
        first_row <- tbl[first_row_index]
        colspan <- as.numeric(gsub(pattern='[^0-9]+', replacement='', first_row))
        colspan <- colspan / length(dataset_labels)
        new_first_row <- sub('[0-9]+', colspan, first_row)
        replacement <- rep(stringr::str_extract(new_first_row, pattern2), 2)
        replacement <- stringr::str_replace(replacement, pattern1, dataset_labels)
        replacement <- paste(replacement, collapse=' & ')
        new_first_row <- stringr::str_replace(new_first_row, pattern2, replacement)
        new_first_row <- stringr::str_replace_all(new_first_row, 'multi', '\\\\multi')
        new_first_row <- stringr::str_replace_all(new_first_row, 'textit', '\\\\textit')
        tbl[first_row_index] <- new_first_row
        if ( cat_output ) {
            cat(tbl, sep='\n')
        }
        if ( !is.null(out_file) ) {
            cat(tbl, sep='\n', file=out_file)
        }
    }
    

    Then if you run the code below in an R script, or put it in a chunk in an Rmd file, you will get the output written to the file 'test_out.tex' as well as directly output:

    custom_table(c('Data: Swiss', 'Data: Swiss2'),
                 out_file='test_out.tex',
                 m1, m2, m3, m4, m5, m6, m7, m8,
                 type = "latex",
                 header=FALSE,
                 omit.stat=c("LL","ser","f","adj.rsq"), 
                 font.size="tiny", 
                 column.labels = c("(M1)", "(M2)", "(M3)", "(M4)", "(M5)", "(M6)", "(M7)", "(M8)"), 
                 model.names = FALSE,
                 model.numbers = FALSE,
                 star.cutoffs = c(0.05, 0.01, 0.001),
                 dep.var.labels = c("Outcome 1", "Outcome 2", "Outcome 1", "Outcome 2"))
    

    Using the out option of the stargazer() function won't quite work because stargazer() will write the output to out before we've had a chance to make our modifications to it, but this tweak works.