I have 2 sets of pairwise alignments, where query genome 1 (q1) is aligned to the reference genome and query genome 2 (q2) is aligned to the same reference genome. Therefore, I have both alignments with a coordinate system in the reference genome. The alignments are in the form of GRanges objects.
I would like to project the breakpoints of q2 onto q1, by aligning the breakpoints of q1 in the center, and look for any clustering of q2 breakpoints around the q1 breakpoints, all in the reference genome coordinate system.
Therefore, I make a GRanges object of q1 with its breakpoints in the center. For example, if there is a breakpoint in q1 relative to the reference genome at scaffold 1, bp 833, then taking a window on 500 either side of this, the q1 GRanges object will have an element:
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] S1 333-1333 *
-------
seqinfo: 576 sequences from an unspecified genome; no seqlengths
I then construct a GRanges object of the breakpoints on q2, but all seqlengths are of length 1. I intersect this with the q1 GRanges object, so that q2 only obtains points that can be projected onto q1.
The CoverageHeatmap function requires:
windows:
A set of GRanges of equal length
track:
A GRanges or RleList object specifying coverage
When I call the CoverageHeatmap function, I always get this error and warning message:
Error: subscript contains out-of-bounds ranges
In addition: Warning message:
In e1 == Rle(e2) :
longer object length is not a multiple of shorter object length
Called from: S4Vectors:::.subscript_error("subscript contains out-of-bounds ",
"ranges")
I've tried a bunch of things to try and make this work and still get the same error and warning message. This is my code (including when I've tried the function with q2 as a GRanges object and an RleList)
## BP Pairwise comparison, using 3rd genome as co-ordinate reference
# q1 is used as the centre point reference, with q2 bps projected on to it.
# gr_ref_q1 is the pw alignment between the reference and query genome 1
# gr_ref_q2 is the pw alignment between the reference and query genome 2
# We construct two GRanges objects to feed into CoverageHeatMaps
library(schoolmath)
library(heatmaps)
library(IRanges)
bp_3gen_v2 <- function(gr_ref_q1, gr_ref_q2, win){
# Failsafes (check ref genome is the same, etc)
if(!(is.even(win))){stop("win should be an even number")}
## Construct g1_rco (1st GRanges object)
# IRanges object
q1_starts1 <- start(ranges(gr_ref_q1)) - (win*0.5)
q1_starts2 <- end(ranges(gr_ref_q1)) - (win*0.5)
q1_starts <- c(q1_starts1, q1_starts2)
q1_ends1 <- start(ranges(gr_ref_q1)) + (win*0.5)
q1_ends2 <- end(ranges(gr_ref_q1)) + (win*0.5)
q1_ends <- c(q1_ends1, q1_ends2)
q1_ir_ob <- IRanges(start = q1_starts, end = q1_ends)
# GR object
g1_vec_seq <- as.vector(seqnames(gr_ref_q1))
gr1_seqnames <- c(g1_vec_seq, g1_vec_seq)
g1_rco <- GRanges(seqnames = gr1_seqnames, ranges = q1_ir_ob,
seqinfo = seqinfo(gr_ref_q1))
# Remove negative ranges from GR object
g1_rco <- g1_rco[!(start(ranges(g1_rco)) < 0)]
## Construct g2_rco (2nd GRanges object)
# IRanges object
q2_starts <- start(ranges(gr_ref_q2))
q2_ends <- end(ranges(gr_ref_q2))
q2_bps <- c(q2_starts, q2_ends)
q2_ir_ob <- IRanges(start = q2_bps, end = q2_bps)
# GR object
g2_vec_seq <- as.vector(seqnames(gr_ref_q2))
gr2_seqnames <- c(g2_vec_seq, g2_vec_seq)
g2_rco <- GRanges(seqnames = gr2_seqnames, ranges = q2_ir_ob,
seqinfo = seqinfo(gr_ref_q2))
# Try removing anywhere in g2_rco that is not present in g1_rco
# find intersection of seqnames
g_inter <- intersect(g1_vec_seq, g2_vec_seq)
# apply to g2_rco to remove out of bound scaffols
g2_rco <- g2_rco[seqnames(g2_rco) == g_inter]
# now to remove out of bound ranges (GRanges object)
g2_red <- intersect(g1_rco, g2_rco)
# And try as RleList object
g2_red_rle <- coverage(g2_red)
# Heatmap
heat_map <- CoverageHeatmap(windows = g1_rco, track = g2_red_rle)
To avoid these problems and to achieve what you need, the simplest solution is to have the same seqlevels and seqlenghts for both GRanges. If you know this for your reference then provide it, if not try this:
First example datasets:
library(heatmaps)
gr1 = GRanges(seqnames=c(1,2,3),
IRanges(start=c(1,101,1001),end=c(500,600,1500)))
gr2 = GRanges(seqnames=c(2,2,3,3),
IRanges(start=c(1,301,1,1201),end=c(2500,4800,3500,9700)))
Then we make a combined range to get the levels and lengths:
combined= range(c(gr1,gr2))
seqlevels(gr1) = as.character(seqnames(combined))
seqlevels(gr2) = as.character(seqnames(combined))
seqlengths(gr1) = end(combined)
seqlengths(gr2) = end(combined)
Then the heatmap can be easily obtained by:
CoverageHeatmap(gr1,coverage(gr2))
Or if you only want to look at gr1 windows that have some values in gr2, then do:
CoverageHeatmap(gr1[countOverlaps(gr1,gr2)>0],coverage(gr2))