Search code examples
rggplot2ggrepel

R: ggrepel, ggplot2 labels above the plotting area


In would like to plot density plots with certain values (for instance: median/mean/etc.). I also would like to display chosen values (for instance median) above the plotting area, so it would not interfere with the distributions itself. Also, in real life I have larger, more diverse dataframes (with much more categories) so I would like to spread the labels, so they would not interfere with each other (I want them to be readable and visually pleasing).

I found similar thread here: ggrepel labels outside (to the right) of ggplot area

And I tried to adopt this strategy (by fixing x coordinate instead of y and enlarging upper margin), but to no avail.

Here is the reprex dataframe:

set.seed(123)

group <- c(rep("control",367), rep("catalytic",276), rep("substrate",304))
sample_name <- c(rep("wt1",100), rep("wt2",75), rep("wt3",192), rep("cat1",221), rep("cat2",55), rep("sub1",84), rep("sub2",67), rep("sub3",153))
tail_length<- c(rnorm(100, mean=100, sd=3), rnorm(75, mean=98, sd=5),rnorm(192, mean=101, sd=2),rnorm(221, mean=88, sd=9),rnorm(55, mean=87, sd=6),rnorm(84, mean=182, sd=2),rnorm(67, mean=165, sd=9),rnorm(153, mean=153, sd=14))

tail_data <- data.frame(group, sample_name,tail_length)

Here is my plotting function:

plot_distribution_with_values <- function(input_data,value_to_show="mean", grouping_factor = "group", title="", limit="") {
  
  #determine the center values to be plotted as x intercepting line(s)
  center_values = input_data %>% dplyr::group_by(!!rlang::sym(grouping_factor)) %>% dplyr::summarize(median_value = median(tail_length,na.rm = TRUE),mean_value=mean(tail_length,na.rm=T))
  
  #main core of the plot
  plot_distribution <- ggplot2::ggplot(input_data, aes_string(x=tail_length,color=grouping_factor)) + geom_density(size=1, aes(y=..ndensity..)) + theme_bw() + scale_x_continuous(limits=c(0, as.numeric(limit))) + coord_cartesian(ylim = c(0, 1))
  
  if (value_to_show=="median") {
    center_value="median_value"
  }
  else if (value_to_show=="mean") {
    center_value="mean_value"
  }
  
  #Plot settings (aesthetics, geoms, axes behavior etc.):
  g.line <- ggplot2::geom_vline(data=center_values,aes(xintercept=!!rlang::sym(center_value),color=!!rlang::sym(grouping_factor)),linetype="longdash",show.legend = FALSE) 
  g.labs <- ggplot2::labs(title= "Tail lengths distribution",
                          x="tail length [units]",
                          y= "normalized density",
                          color=grouping_factor)
  g.values <- ggrepel::geom_text_repel(data=center_values,aes(x=round(!!rlang::sym(center_value)),y=length(data),color=!!rlang::sym(grouping_factor),label=formatC(round(!!rlang::sym(center_value)),digits=1,format = "d")),size=4, direction = "x", segment.size = 0.4, show.legend =F, hjust =0, xlim = c(0,200), ylim = c(0, 1))

  
  #Overall plotting configuration:
  plot <- plot_distribution + g.line + g.labs + g.values
  

  return(plot)
}

Here is the example function call:

plot_distribution_with_values(tail_data, value_to_show = "median", grouping_factor = "group", title = "Tail plot", limit=200)

And below is the output I get: enter image description here

And this is the output I would love to have (sorry for the quality, edited in paint): enter image description here

Also, if you change the grouping factor for "sample_name", then you will see more "crowded" plot, more similar to my irl data.

enter image description here


Solution

  • One option to achieve your desired result:

    1. Set clip="off" in coord_cartesian`
    2. Make some room for the labels by increasing the bottom margin of the title
    3. Set y=1.05 for the labels (the max of data range + the default expansion of the scale by .05)
    4. Set min.segment.length=0
    5. Increase the ylim for the labels
    6. Nudge the position of the labels

    Note: Getting your desired result you probably have to fiddle around with the values for the nudging, the ylim and the margin.

    set.seed(123)
    
    library(ggplot2)
    library(ggrepel)
    library(dplyr)
    
    plot_distribution_with_values <- function(input_data,value_to_show="mean", grouping_factor = "group", title="", limit="") {
      
      #determine the center values to be plotted as x intercepting line(s)
      center_values = input_data %>% dplyr::group_by(!!rlang::sym(grouping_factor)) %>% dplyr::summarize(median_value = median(tail_length,na.rm = TRUE),mean_value=mean(tail_length,na.rm=T))
      
      #main core of the plot
      plot_distribution <- ggplot2::ggplot(input_data, aes_string(x=tail_length,color=grouping_factor)) + 
        geom_density(size=1, aes(y=..ndensity..)) + theme_bw() + scale_x_continuous(limits=c(0, as.numeric(limit))) + 
        coord_cartesian(clip = "off", ylim = c(0, 1))
      
      if (value_to_show=="median") {
        center_value="median_value"
      }
      else if (value_to_show=="mean") {
        center_value="mean_value"
      }
      
      #Plot settings (aesthetics, geoms, axes behavior etc.):
      g.line <- ggplot2::geom_vline(data=center_values,aes(xintercept=!!rlang::sym(center_value),
                                                           color=!!rlang::sym(grouping_factor)),
                                    linetype="longdash",show.legend = FALSE) 
      g.labs <- ggplot2::labs(title= "Tail lengths distribution",
                              x="tail length [units]",
                              y= "normalized density",
                              color=grouping_factor)
      g.values <- ggrepel::geom_text_repel(data=center_values, 
                                           aes(x=round(!!rlang::sym(center_value)),
                                               y = 1.05, color=!!rlang::sym(grouping_factor),
                                               label=formatC(round(!!rlang::sym(center_value)),digits=1,format = "d")),
                                           size=4, direction = "x", segment.size = 0.4,
                                           min.segment.length = 0, nudge_y = .15, nudge_x = -10,
                                           show.legend =F, hjust =0, xlim = c(0,200), 
                                           ylim = c(0, 1.15))
      
      
      #Overall plotting configuration:
      plot <- plot_distribution + g.line + g.labs + g.values + 
        theme(plot.title = element_text(margin = margin(b = 4 * 5.5)))
      
      
      return(plot)
    }
    
    plot_distribution_with_values(tail_data, value_to_show = "median", grouping_factor = "group", title = "Tail plot", limit=200)