Search code examples
rggplot2violin-plot

How to plot distributions by a variable, including the total distribution in ggplot2?


I have a df like this:

df <- data.frame(ID = 1:200, 
                M500 = rnorm(200),
                COUNTRY = rep( c("PERU", "MEXICO", "COLOMBIA"),length.out = 200))

I want to plot a violin graphic comparing the three countries AND the violin of the total population. In other words, I need four violins in the same graph. My basic plot is like this

ggplot(cl3, aes(x = COUNTRY, y = M500, color = COUNTRY, fill = COUNTRY)) +
  geom_violin(alpha = 0.5, trim = TRUE) +
  geom_point(position = position_jitter(width = 0.1), alpha = 0.5, size = 2) +
  stat_summary(fun = "mean", geom = "crossbar", color = "black", size = 0.5) +
  scale_color_brewer(palette = "Set1") +
  scale_fill_brewer(palette = "Set1") +
  theme_minimal() +
  labs(title = "title", x = "COUNTRY", y = "M500")

But I can´t plot the fourth violin with the total.


Solution

  • set.seed(123)
    df <- data.frame(ID = 1:200, 
                     M500 = rnorm(200),
                     COUNTRY = rep( c("PERU", "MEXICO", "COLOMBIA"),length.out = 200))
    
    library(dplyr)
    library(ggplot2)
    library(tidyr)
    
    df %>%
      left_join(df %>% mutate(COUNTRY = 'TOTAL'), by = join_by(ID, M500)) %>%
      pivot_longer(c(COUNTRY.x, COUNTRY.y), values_to = 'COUNTRY') %>%
      ggplot(aes(x = COUNTRY, y = M500, color = COUNTRY, fill = COUNTRY)) +
      geom_violin(alpha = 0.5, trim = TRUE) +
      geom_point(position = position_jitter(width = 0.1), alpha = 0.5, size = 2) +
      stat_summary(fun = "mean", geom = "crossbar", color = "black", linewidth = 0.5) +
      scale_color_brewer(palette = "Set1") +
      scale_fill_brewer(palette = "Set1") +
      theme_minimal() +
      labs(title = "title", x = "COUNTRY", y = "M500")
    

    Created on 2023-12-19 with reprex v2.0.2