Search code examples
rstatisticslatexhmisc

How to obtain a descriptive LateX table of continuous variable stratified by groups in R


dat <- data.frame(outcome = rnorm(25), 
         sex = sample(c("F", "M"),  25, replace = TRUE),
         age_group = sample(c(1, 2, 3), 25, replace = TRUE))
> head(dat)
  outcome sex age_group
1  1.1423   F         2
2  0.0998   M         1
3 -1.6305   F         2
4 -1.6759   F         1
5  0.3825   F         2
6  0.7274   F         3

I have a dataset that has a continuous outcome variable. I would like to obtain a LaTeX table of descriptive statistics for this variable stratified by sex and age_group. I would like it to look something like this (it doesn't have to have mean (SD) but I want the layout of outcome stratified by age_group and sex):

enter image description here

I've tried the Hmisc package:

library(Hmisc)
output <- summaryM(outcome ~ sex + age_group, data = dat, test = TRUE)
latex(output, file = "")

but the output looks very different from what I want:

enter image description here


Solution

  • Im more familiar with the gt package, and highly recommend you learn how to use it.

    Here is a solution using gt package and your example code.

    #Install the package and load the dependencies. Here Ill be using dplyr to 
    #group by variables.
    install.packages("gt")
    library(gt)
    library(dplyr)
    dat <- data.frame(outcome = rnorm(25), 
                      sex = sample(c("F", "M"),  25, replace = TRUE),
                      age_group = sample(c(1, 2, 3), 25, replace = TRUE))
    
    head(dat) %>%
    #Group by desired column
        group_by(sex) %>%
    #Create a gt table with the data frame
        gt() %>% 
    #Rename columns
        cols_label(outcome = "",
                   sex = "Sex",
                   age_group = "Cohort") %>% 
    #Add a table title
    #Notice the `md` function allows to write the title using markdown syntax (which allows HTML)
        tab_header(title = md("Table 1: Descriptive Statistics (N = 7")) %>% 
    #Add a data source footnote
        tab_source_note(source_note = "Data: Stackoverflow question 7508787 [user: Adrian]")%>%
    #you can customize the table´s body and lines as well using the tab_option
    #function and tab_style function.
        tab_options(row.striping.include_table_body = FALSE) %>%
        tab_style(style = cell_borders(
          sides = c("top"),
          color = "black",
          weight = px(1),
          style = "solid"),
          locations = cells_body(
            columns = everything(),
            rows = everything()
          )) %>%
    #Finally you can create summaries with different statistics as wanted.
      summary_rows(
        groups = TRUE,
        columns = outcome,
        fns = list(
          average = "mean",
          total = "sum",
          SD = "sd")
      )
    
    

    The final table looks like this:enter image description here