Search code examples
rggplot2alignment

In ggplot2, align plotted lines according to the given condition


Let's have a following dummy data:

library(tidyverse)
library(ggplot2)

df <- tibble(
  id = c(rep("abcdef-123", 3), rep("defghi-678", 2), rep("mnopqr-345", 1)),
  length = c(rep(137, 3), rep(293, 2), rep(91, 1)),
  position = c(10, 77, 103, 82, 222, 45)
)

This dataframe contains 3 columns. "id" corresponds to the object (item) name, "length" corresponds to the total length of the item, while the "position" indicates where in the given "length" an interesting feature occurred. So each unique "id" has its unique "length", while there might be more than one "position" observed per "id".

I group the data by the "id", as this is the unique label of each item:

df_grouped <- df %>% group_by(id)

Then I want to plot the data in the following manner:

  • each "id" should be depicted as a separate horizontal line
  • positions should be marked as points
  • the lines should be aligned according to the first (or ideally:chosen) position in each "id"

This is what I am able to obtain so far:

ggplot2::ggplot(df_grouped, aes(x=length, y=id, xend=0, yend=id)) + 
  ggplot2::geom_segment()+ 
  ggplot2::geom_point(aes(x=position, y=id), size=2) + 
  ggplot2::theme_void() +
  ggplot2::theme(axis.ticks.x = element_blank(), axis.text.x = element_blank())#+ggplot2::scale_y_discrete()

enter image description here

And this is what I want to achieve (I made this in gimp): enter image description here

I do not know how to align lines conditionally (according to the selected first or n-th position). I tried multiple solutions, including indexing positions in brackets while passing the arguments to the aesthetics. This did not work, so I am asking for help.

Currently trying to figure this out with Bioconductor, but would appreciate base R or ggplot2 solution(s).


Solution

  • First step is to shift the position by the maximum value of the first position per id. Second is to shift the starting positions for your segments which is the minimum of the shifted or nudged positions minus the position of the first point per id. Note that I use a separate and summarized dataset for the segments.

    library(tidyverse)
    
    df <- df |>
      mutate(start = min(position), .by = id) |>
      mutate(
        start_max = max(start),
        nudge = start_max - start,
        position_nudge = position + nudge
      )
    
    df_segment <- df |>
      mutate() |>
      summarise(
        x = min(position_nudge - start),
        xend = min(x + length),
        .by = id
      )
    
    ggplot(df) +
      geom_segment(data = df_segment, aes(x = x, xend = xend, y = id, yend = id)) +
      geom_point(aes(x = position_nudge, y = id), size = 2) +
      theme_void() +
      theme(
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()
      )