Search code examples
rsummaryinferencegtsummary

Create a table of summary statistics (with p.value) with sub-levels (long list)


I am needing to conduct inferential analysis of a list of 21 countries comparing results (numeric variable) between gender. I have already created a pivot-long dataset with the following variables: Gender, Country, Results (numeric). I am using gtsummary::tbl_strata and gtsummary::tbl_summary. I could not create a nesting to run each country individually. Also, the output has been returning n(%) counts for the countries (table in wide format); calculating the result variable as overall. I have put the tabular structure I want below.

enter image description here

I could even generate individual tables and stack them. However, I would like a more rational strategy.

Code

library(tidyverse)
library(gtsummary)

# dataframe
df <- 
  data.frame(
    Country = c("Country 1", "Country 2", "Country 3", 
               "Country 1", "Country 2", "Country 3",
               "Country 1", "Country 2", "Country 3",
               "Country 1", "Country 2", "Country 3"),
    Gender = c("M", "M", "M",
                "W", "W", "W",
               "M", "M", "M",
               "W", "W", "W"), 
    Results = c(53, 67, 48,
          56, 58, 72, 
          78, 63, 67,
          54,49,62))
df

# Table
Table <- df %>%
  select(c('Gender',
           'Country',
           'Results')) %>%
  tbl_strata(
    strata = Country,
    .tbl_fun =
      ~.x %>%
  tbl_summary(by = Gender, 
              missing = "no") %>%
  bold_labels() %>%
  italicize_levels() %>%
  italicize_labels())
Table

Solution

  • Here's how you can get that table:

    remotes::install_github("ddsjoberg/gtsummary")
    library(gtsummary)
    packageVersion("gtsummary")
    #> [1] '1.3.7.9004'
    library(tidyverse)
    
    df <- 
      data.frame(
        Country = c("Country 1", "Country 2", "Country 3", 
                    "Country 1", "Country 2", "Country 3",
                    "Country 1", "Country 2", "Country 3",
                    "Country 1", "Country 2", "Country 3"),
        Gender = c("M", "M", "M",
                   "W", "W", "W",
                   "M", "M", "M",
                   "W", "W", "W"), 
        Results = c(53, 67, 48,
                    56, 58, 72, 
                    78, 63, 67,
                    54,49,62))
    
    
    theme_gtsummary_mean_sd()
    tbl <-
      df %>%
      nest(data = -Country) %>%
      rowwise() %>%
      mutate(
        tbl = 
          data %>%
          tbl_summary(
            by = Gender,
            type = Results ~ "continuous",
            statistic = Results ~ "{mean} ± {sd}",
            label = list(Results = Country)
          ) %>%
          add_p() %>%
          modify_header(list(
            label ~ "**Country**",
            all_stat_cols() ~ "**{level}**"
          )) %>%
          list()
      ) %>%
      pull(tbl) %>%
      tbl_stack() %>%
      modify_spanning_header(all_stat_cols() ~ "**Gender**")
    

    enter image description here Created on 2021-03-05 by the reprex package (v1.0.0)