This might be a duplicate, but none of the questions I found seem to help my case.
I have a finaldf
data frame that contains values at different time points, and specific time points are used as x-axis breaks (corresponding to a DNA sequence). I obtain it like this:
myseq <- "AGAATATTATACATTCATCT"
set.seed(123)
mydata <- data.frame(time=1:100, value=rnorm(100, mean=10, sd=2))
indices <- seq(5, 100, length.out=20)
seqsplit <- unlist(strsplit(myseq, ""))
ind_df <- data.frame(call=seqsplit, time=indices)
finaldf <- dplyr::left_join(mydata, ind_df, by="time")
It looks like this:
> finaldf
time value call
1 1 8.879049 <NA>
2 2 9.539645 <NA>
3 3 13.117417 <NA>
4 4 10.141017 <NA>
5 5 10.258575 A
6 6 13.430130 <NA>
7 7 10.921832 <NA>
8 8 7.469878 <NA>
9 9 8.626294 <NA>
10 10 9.108676 G
11 11 12.448164 <NA>
12 12 10.719628 <NA>
13 13 10.801543 <NA>
14 14 10.221365 <NA>
15 15 8.888318 A
...
And I plot it like this:
P <- ggplot2::ggplot(finaldf, ggplot2::aes(x=time, y=value)) +
ggplot2::geom_line(linewidth=0.5) +
ggplot2::scale_x_continuous(breaks=indices, labels=seqsplit) +
ggplot2::theme_light()
grDevices::pdf(file="test.pdf", height=4, width=10)
print(P)
grDevices::dev.off()
Resulting in this plot:
Now I want to introduce different gaps in the sequence and obtain "gapped plots", identical to the one above but with gaps.
My starting point would be the original finaldf
and different gapped sequences, identical to the original sequence but with gaps. For example:
gapseq1 <- "AGAA-TAT--TAT-ACATT---CATCT-"
gapseq2 <- "A-G-AATAT----TATACATTCA-TCT"
For these 2 gapped sequences, I want to recreate the following plots (ideally preserving the grid, but not needed):
How can I accomplish this in an easy way? Thanks!
Use gregexpr
to find the indices of each "-" in your gap sequence and add_row
to add rows to the data frame using these indices.
library(tibble)
(idx <- 5 * (gregexpr("-", gapseq1)[[1]]) - 2)
for(i in idx)
finaldf <- finaldf |> add_row(value=rep(NA, 5), .before=i)
finaldf$time <- 1:nrow(finaldf)
indices <- seq(5, nrow(finaldf), 5)
labels <- unlist(strsplit(gapseq1, ""))
ggplot2::ggplot(finaldf, ggplot2::aes(x=time, y=value)) +
ggplot2::geom_line(linewidth=0.5) +
ggplot2::scale_x_continuous(breaks=indices, labels=labels) +
ggplot2::theme_light()