I am interested in two things 1) Summary for multiple subgroups in the same table and 2) dotplot for the subgroups based on the summary generated in step1.
For example ,
if this is my dataset
data("pbc")
I like to generate summary of cholesterol (chol
), by sex
, stage
, ascites
and spiders
for two treatment levels 1, 2
table(pbc$trt)
1 2
158 154
I can do this separately like this.
library(Hmisc)
summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1))
summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2))
This creates two separate summaries.
Two different corresponding plots
plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1)))
plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2)))
I like the summaries to be in one table , two columns 1 column for trt=1
and 2nd column for trt=2
N | chol (trt=1) | chol (trt=2) | ||
---|---|---|---|---|
sex | m | .. | ..... . | .... .. |
f | .. | ..... . | .... .. |
And the plot side by side. 1st plot for trt=1 , second plot for trt=2
Kindly suggest suggest how to scale the Hmisc:::summary.formula , summary function to 1) show summaries by subgroups side-by-side 2) Plot the summaries side-by-side. Thanks.
Please note that your current summaries and plots are identical; despite using subset
with the two levels of trt
, your two posted plots are identical. You can use filter
to definitively filter by the levels of trt
.
First, I prefer gtsummary
with my tables, since you can use tbl_continuous
to make one singular table instead of trying to combine two tables. Second, you will likely encounter difficulty trying to combine your two plots since you're using base R plotting functions on Hmisc
summary objects. Even trying to save each plot to an object will result in NULL
. In the long run, it may be easier to recreate each plot using ggplot
and combining with cowplot::plot_grid
.
library(survival)
library(Hmisc)
# create combined summary
library(gtsummary)
library(tidyverse)
data(pbc)
df <- pbc %>%
select(id, trt, chol, sex, stage, ascites, spiders) %>%
mutate(across(c(sex, stage, ascites, spiders), as.factor)) %>%
mutate(trt = factor(trt)) %>%
mutate(chol = as.numeric(chol))
dftrt1 <- df %>% filter(trt == 1)
dftrt2 <- df %>% filter(trt == 2)
df %>%
select(trt, chol, sex, stage, ascites, spiders) %>%
tbl_continuous(variable = chol,
digits = everything() ~ 2,
statistic = everything() ~ "{mean}",
label = list(sex ~ "Sex",
stage ~ "Stage",
ascites ~ "Ascites",
spiders ~ "Spiders"),
by = trt)
# create combined plot
library(cowplot)
p1 <- dftrt1 %>%
select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
summarise(chol = mean(chol, na.rm = TRUE)) %>%
ggplot(aes(x = value, y = chol, fill = factor(value))) +
geom_point() + coord_flip() +
facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
theme(panel.spacing = unit(0, "lines"),
panel.border = element_rect(fill = NA),
strip.background = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
strip.placement = "outside") +
ggtitle("trt = 1") + theme(plot.title = element_text(hjust = 0.5))
p2 <- dftrt2 %>%
select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
summarise(chol = mean(chol, na.rm = TRUE)) %>%
ggplot(aes(x = value, y = chol, fill = factor(value))) +
geom_point() + coord_flip() +
facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
theme(panel.spacing = unit(0, "lines"),
panel.border = element_rect(fill = NA),
strip.background = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
strip.placement = "outside") +
ggtitle("trt = 2") + theme(plot.title = element_text(hjust = 0.5))
plot_grid(p1, p2, ncol = 2)