Search code examples
rggplot2data-visualizationboxplotmedian

How to insert a "total" group to my grouped boxplot?


I'm currently doing a grouped boxplot to show median values and interquartile intervals of serum levels in periods of time comparing two therapy models. You can see the current graph right below. What I want to do aswell but could not find how is how to put another x "category" with the total sample (pdse and tcc groups together, all observations). I hope with the code and info I gave here it's sufficient for you to try give me some light in here.

This is the boxplot I could do for now.

This is the structure of the subset I created to generate the ggplot2 boxplot.

structure(list(modeloterapia = structure(c(2L, 1L, 1L, 2L, 1L, 
1L, 2L, 2L, 1L, 2L), .Label = c("pdse", "tcc"), class = "factor"), 
    periodo = c("ND_IL_6I", "ND_IL_6I", "ND_IL_6I", "ND_IL_6I", 
    "ND_IL_6I", "ND_IL_6I", "ND_IL_6I", "ND_IL_6I", "ND_IL_6I", 
    "ND_IL_6I"), nivel = c(156.475, 25.393, 5.20696, 29.448, 
    636.561, 16.7, 20.83028, 13.04912, 17.28, 30.686)), row.names = c(NA, 
10L), class = "data.frame")

Here is the subset I did using dplyr. And there is the ggplot2 command itself.

plots <- df %>% 
          select(ND_IL_6I, IL_6_6mesesultimovalorITT, IL6_12multimovalorITT, IL_6finalultimovalorITT, modeloterapia)

plots_adj <- plots %>% 
              gather("periodo", "nivel", -modeloterapia)

ggplot(plots_adj, aes(x = modeloterapia, y = nivel, fill = periodo)) +
  geom_boxplot() + 
  coord_cartesian(ylim=c(0, 70)) +
  labs(title = "Níveis séricos de interleucina-6 em cada modelo de psicoterapia por período",
              subtitle = "Gráfico dos níveis séricos pelo modelo de terapia",
              x = "Modelo de terapia", y = "Níveis de interleucina-6 (pg/ml)") +
  scale_fill_discrete(name = "Período", labels = c("6 meses", "Pós-intervenção", "12 meses", "Linha de base")) +
  scale_x_discrete(labels = c('Psicoterapia Dinâmica Suportivo Expressiva','Terapia Cognitiva Comportamental'))

Hope you have a nice day!


Solution

  • This can give you an idea of how to proceed. You can compute a new aggregation level (I have used mean values) and then bind to your original data. Here the code, where I have used your dput() data as plots_adj:

    library(ggplot2)
    library(dplyr)
    #Code
    plots_adj %>% bind_rows(
      plots_adj %>% group_by(modeloterapia,periodo) %>%
        summarise(nivel=mean(nivel,na.rm=T)) %>%
        mutate(modeloterapia='Total')
    ) %>%
      ggplot(aes(x = modeloterapia, y = nivel, fill = periodo)) +
      geom_boxplot() + 
      coord_cartesian(ylim=c(0, 200)) +
      labs(title = "Níveis séricos de interleucina-6 em cada modelo de psicoterapia por período",
           subtitle = "Gráfico dos níveis séricos pelo modelo de terapia",
           x = "Modelo de terapia", y = "Níveis de interleucina-6 (pg/ml)") +
      scale_fill_discrete(name = "Período", labels = c("6 meses", "Pós-intervenção", "12 meses", "Linha de base")) +
      scale_x_discrete(labels = c('Psicoterapia Dinâmica Suportivo Expressiva','Terapia Cognitiva Comportamental',
                                  'Total'))
    

    Output:

    enter image description here