Search code examples
rggplot2gaps-in-datagaps-in-visuals

Introduce sequence gaps in a ggplot lineplot


This might be a duplicate, but none of the questions I found seem to help my case.

I have a finaldf data frame that contains values at different time points, and specific time points are used as x-axis breaks (corresponding to a DNA sequence). I obtain it like this:

myseq <- "AGAATATTATACATTCATCT"
set.seed(123)
mydata <- data.frame(time=1:100, value=rnorm(100, mean=10, sd=2))
indices <- seq(5, 100, length.out=20)
seqsplit <- unlist(strsplit(myseq, ""))
ind_df <- data.frame(call=seqsplit, time=indices)
finaldf <- dplyr::left_join(mydata, ind_df, by="time")

It looks like this:

> finaldf
    time     value call
1      1  8.879049 <NA>
2      2  9.539645 <NA>
3      3 13.117417 <NA>
4      4 10.141017 <NA>
5      5 10.258575    A
6      6 13.430130 <NA>
7      7 10.921832 <NA>
8      8  7.469878 <NA>
9      9  8.626294 <NA>
10    10  9.108676    G
11    11 12.448164 <NA>
12    12 10.719628 <NA>
13    13 10.801543 <NA>
14    14 10.221365 <NA>
15    15  8.888318    A
...

And I plot it like this:

P <- ggplot2::ggplot(finaldf, ggplot2::aes(x=time, y=value)) +
  ggplot2::geom_line(linewidth=0.5) +
  ggplot2::scale_x_continuous(breaks=indices, labels=seqsplit) +
  ggplot2::theme_light()
grDevices::pdf(file="test.pdf", height=4, width=10)
print(P)
grDevices::dev.off()

Resulting in this plot:

plot1

Now I want to introduce different gaps in the sequence and obtain "gapped plots", identical to the one above but with gaps.

My starting point would be the original finaldf and different gapped sequences, identical to the original sequence but with gaps. For example:

gapseq1 <- "AGAA-TAT--TAT-ACATT---CATCT-"
gapseq2 <- "A-G-AATAT----TATACATTCA-TCT"

For these 2 gapped sequences, I want to recreate the following plots (ideally preserving the grid, but not needed):

gap1

gap2

How can I accomplish this in an easy way? Thanks!


Solution

  • Use gregexpr to find the indices of each "-" in your gap sequence and add_row to add rows to the data frame using these indices.

    library(tibble)
    
    (idx <- 5 * (gregexpr("-", gapseq1)[[1]]) - 2)
    
    for(i in idx)
      finaldf <- finaldf |> add_row(value=rep(NA, 5), .before=i)
    
    finaldf$time <- 1:nrow(finaldf) 
    indices <- seq(5, nrow(finaldf), 5)
    labels <- unlist(strsplit(gapseq1, ""))
    
    ggplot2::ggplot(finaldf, ggplot2::aes(x=time, y=value)) +
      ggplot2::geom_line(linewidth=0.5) +
      ggplot2::scale_x_continuous(breaks=indices, labels=labels) +
      ggplot2::theme_light()
    

    enter image description here enter image description here