Search code examples
riranges

R combine output of findOverlaps and countOverlaps


I have two sets of IRanges to compare. My goal is to get an output that has the position of overlaps if one exists and the offset of the ranges listed as a negative start if they do not overlap. At the very least if I can't get the offset I would want to get a "0" to indicate there is no overlap. For example:

xx<-IRanges(start=c(2,9,19,31,45), end=c(3,11,23,35,49))

        IRanges of length 5
     start end width
[1]      2   3     2
[2]      9  11     3
[3]     19  23     5
[4]     31  35     5
[5]     45  49     5

and

yy<-IRanges(start=c(4,10,19,33,45), end=c(5,13,25,38,48))

IRanges of length 5
     start end width
[1]      4   5     2
[2]     10  13     4
[3]     19  25     7
[4]     33  38     6
[5]     45  48     4

Using findOverlaps + ranges gives me:

> fo <-findOverlaps(xx,yy)
> ranges(fo, xx, yy)
IRanges of length 4
    start end width
[1]    10  11     2
[2]    19  23     5
[3]    33  35     3
[4]    45  48     4

I would like the final output to be a dataframe or something that would look like this:

       start end width
[1]     -1   0     0
[2]     10  11     2
[3]     19  23     5
[4]     33  35     3
[5]     45  48     4

I am able to get the indexes of the ranges that overlap using countOverlaps and the hits object for the comparison using findOverlaps + ranges but am at a loss as to how to combine the results to get the desired output.


Solution

  • library(IRanges)
    
    f <- function(a,b)
    {
      s <- max(a$start,b$start)  
      e <- min(a$end,b$end)
    
      if ( s <= e )
      {
        ovlp <- c( start = s,
                   end   = e,
                   width = e-s+1 )
      } else
      {
        ovlp <- c( start = e-s,
                   end   = 0,
                   width = NA )
      }
    
      return(ovlp)
    }
    
    findOvlp <- function( X, Y )
    {
      if ( length(X) != length(Y) ){ stop("length(X) != length(Y)") }
    
      n <- length(X)
    
      X.df <- as.data.frame(X)
      Y.df <- as.data.frame(Y)
    
      Z <- data.frame( start = rep(NA,length(X)),
                       end   = rep(NA,length(X)),
                       width = rep(NA,length(X)) )
    
      for ( i in 1:n ) { Z[i,] <- f(X.df[i,],Y.df[i,]) }
    
      return( Z )
    }
    

    .

    > xx<-IRanges(start=c(2,9,19,31,45), end=c(3,11,23,35,49))
    
    > yy<-IRanges(start=c(4,10,19,33,45), end=c(5,13,25,38,48))
    
    > findOvlp(xx,yy)
      start end width
    1    -1   0    NA
    2    10  11     2
    3    19  23     5
    4    33  35     3
    5    45  48     4