Search code examples
rggplot2geom

How to embed the number of observations into violin plots?


I want to put data on facets of violin plots and annotate these violins with the number of observations used to plot the violin.

Here is an example of what I have without observation counts:

library(ggplot2)
library(dplyr)
library(tidyverse)

data("iris")

c <- rep(c('r', 'g', 'b'), 50)
c <- sample(c)
facet_row <- rep(c('row1', 'row2', 'row3', 'row4', 'row5'), 30)
facet_col <- rep(c('col1', 'col2', 'col3'), 50)

iris$facet_rows <- facet_row
iris$facet_cols <- facet_col
iris$color <- c
iris$count <- sample(1:10, size = 150, replace = T)

p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) + 
  geom_violin(alpha = 0.7, na.rm = T) +
  coord_flip() +
  facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))

print(p)

Result: enter image description here

I want to put the number of observations right behind those violins. I tried this so far:

count_data <- function (y){
  df <- data.frame(y = min(y) - 0.2, label = length(y))
  return(df)
}

p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) + 
  geom_violin(alpha = 0.7, na.rm = T) + stat_summary(fun.data = count_data, geom = "text", aes(group = Species)) +
  coord_flip() +
  facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))


print(p)

This produces an output with an issue: enter image description here

Grouped violins now have one count value. The problem is that those violins most definetly will have different number of observations.

I have tried to just draw a geom_text using precomputed number of observations (assume that iris$count actually contains observation counts that will have the same value for different rows, but random here):

p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) + 
  geom_violin(alpha = 0.7, na.rm = T) + geom_text(aes(label=count, y=Petal.Length), nudge_y = -0.1) +
  coord_flip() +
  facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))

print(p)

This has a similar problem with the previous approach: enter image description here

  1. It has values for two violins in the same group in one line.
  2. Each violin repeats the number of observations once for each observation.

I am relatively new to R, I feel like there is a clean way to do this, but I can't figure it out...


Solution

  • Removing the explicit grouping and putting position_dodge resolved the issue:

    count_data <- function (y){
      df <- data.frame(y = min(y) - 0.2, label = length(y))
      return(df)
    }
    
    p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color)) + 
      geom_violin(alpha = 0.7, na.rm = T) + stat_summary(fun.data = count_data, geom = "text", position = position_dodge(1)) +
      coord_flip() +
      facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))
    
    
    print(p)