Search code examples
rggplot2groupingviolin-plotggpubr

Combining overlapping groups to include in violin plot/box plot in R


I am struggling with the following code using the iris dataset. I would like to draw a violin plot, only including setosa species, and do some complex overlapping combinations on subgroups of data.

Specifically, on the x-axis, I would like to first divide continuous Sepal.Length data into groups: group A=Sepal.Length < 4.7, group B=Sepal.Length 4.7 - 5, group C=Sepal.Length 5 - 5.2 and group D=Sepal.Length > 5.2.

Then, I would like to plot four violins/boxes on the x-axis, single/overlapping groups: "B", "A+C", "D", "A+C+D". The y-axis is simply "Petal.Length".

I am also including code to show the sample size (n) for each violin.

I would appreciate any suggestions. Thank you.

library(dplyr)
library(ggplot2)
library(ggpubr)
# Define order of violins on x-axis.
order <- c("B", "AC", "D", "ACD")
# Function to display sample size (n) for each violin.
give.n <- function(x){return(c(y = min(Petal.Length), label = length(x)))}
iris %>% 
  filter(Species == "setosa") %>% 
  mutate(sub_a = case_when( Sepal.Length < 4.7~"A",
                        Sepal.Length < 5~ "B",
                        Sepal.Length < 5.2~ "C",
                        TRUE~"D")) %>% 
  mutate(collapsed = c((ifelse(sub_a %in% c("A", "C"), "AC", sub_a)), (ifelse(sub_a %in% c("AC", "D"), "ACD", sub_a)))) %>% 
  ggviolin(iris[iris$Species == "setosa", ], x=collapsed, y=Petal.Length) + scale_x_discrete(limits=order) + stat_summary(fun.data = give.n, geom = "text")

Edit

See below for expected result. Note that the numbers below each violin are accurate. The rest of the image is only an example of expected outcome.

enter image description here


Solution

  • I cannot see how to do this as a single chain, but here is a brute-force solution that uses cut and then bind_rows

    setosa <- iris %>% filter(Species == "setosa")  %>% 
      mutate(group = cut(Sepal.Length, breaks = c(0, 4.7, 5, 5.2, Inf), labels = c("A", "B", "C", "D"), right = FALSE))
    
    bind_rows(B = setosa %>% filter(group == "B"),
              AC =  setosa %>% filter(group %in% c("A", "C")),
              D =  setosa %>% filter(group == "D"),
              ACD = setosa %>% filter(group %in% c("A", "C", "D")),
              .id = "group2"
              ) %>% 
      mutate(group2 = factor(group2, levels = c("B", "AC", "D", "ACD"))) %>% 
      ggplot(aes(x = group2, y = Petal.Length)) + 
      geom_violin()