Search code examples
rplothmisc

summary and plot for multiple subgroups(more columns)


I am interested in two things 1) Summary for multiple subgroups in the same table and 2) dotplot for the subgroups based on the summary generated in step1.

For example ,

if this is my dataset

     data("pbc")

I like to generate summary of cholesterol (chol), by sex, stage, ascites and spiders for two treatment levels 1, 2

     table(pbc$trt)
     1    2   
     158  154

I can do this separately like this.

     library(Hmisc)
     
     summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1))
     summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2))

This creates two separate summaries.

Two different corresponding plots

plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1)))

plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2)))

I like the summaries to be in one table , two columns 1 column for trt=1 and 2nd column for trt=2

N chol (trt=1) chol (trt=2)
sex m .. ..... . .... ..
f .. ..... . .... ..

And the plot side by side. 1st plot for trt=1 , second plot for trt=2

enter image description here

Kindly suggest suggest how to scale the Hmisc:::summary.formula , summary function to 1) show summaries by subgroups side-by-side 2) Plot the summaries side-by-side. Thanks.


Solution

  • Please note that your current summaries and plots are identical; despite using subset with the two levels of trt, your two posted plots are identical. You can use filter to definitively filter by the levels of trt.

    First, I prefer gtsummary with my tables, since you can use tbl_continuous to make one singular table instead of trying to combine two tables. Second, you will likely encounter difficulty trying to combine your two plots since you're using base R plotting functions on Hmisc summary objects. Even trying to save each plot to an object will result in NULL. In the long run, it may be easier to recreate each plot using ggplot and combining with cowplot::plot_grid.

    library(survival)
    library(Hmisc)
    
    # create combined summary
    library(gtsummary)
    library(tidyverse)
    
    data(pbc)
    df <- pbc %>%
      select(id, trt, chol, sex, stage, ascites, spiders) %>%
      mutate(across(c(sex, stage, ascites, spiders), as.factor)) %>%
      mutate(trt = factor(trt)) %>%
      mutate(chol = as.numeric(chol))
    
    dftrt1 <- df %>% filter(trt == 1)
    dftrt2 <- df %>% filter(trt == 2)
    
    df %>%
      select(trt, chol, sex, stage, ascites, spiders) %>%
      tbl_continuous(variable = chol,
                     digits = everything() ~ 2,
                     statistic = everything() ~ "{mean}",
                     label = list(sex ~ "Sex", 
                                  stage ~ "Stage", 
                                  ascites ~ "Ascites", 
                                  spiders ~ "Spiders"),
                     by = trt)
    

    enter image description here

    # create combined plot
    library(cowplot)
    p1 <- dftrt1 %>%
      select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
      summarise(chol = mean(chol, na.rm = TRUE)) %>%
      ggplot(aes(x = value, y = chol, fill = factor(value))) + 
      geom_point() + coord_flip() +
      facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
      theme(panel.spacing = unit(0, "lines"),
              panel.border = element_rect(fill = NA),
              strip.background = element_blank(),
              axis.title.y = element_blank(),
              legend.position = "none",
              strip.placement = "outside") +
      ggtitle("trt = 1") + theme(plot.title = element_text(hjust = 0.5))
    
    p2 <- dftrt2 %>%
      select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
      summarise(chol = mean(chol, na.rm = TRUE)) %>%
      ggplot(aes(x = value, y = chol, fill = factor(value))) + 
      geom_point() + coord_flip() +
      facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
      theme(panel.spacing = unit(0, "lines"),
            panel.border = element_rect(fill = NA),
            strip.background = element_blank(),
            axis.title.y = element_blank(),
            legend.position = "none",
            strip.placement = "outside") +
      ggtitle("trt = 2") + theme(plot.title = element_text(hjust = 0.5))
    
    plot_grid(p1, p2, ncol = 2)
    

    enter image description here