Search code examples
rgenomegenomicranges

Finding overlaps between 2 ranges and their overlapped region lengths?


I need to find length of overlapped region on same chromosomes between 2 group(gp1 & gp2). (similar question in stackoverflow were different from my aim, because I wanna find overlapped region not a TRUE/FALSE answer).

For example:

gp1: 
chr   start   end   id1 
chr1  580     600    1
chr1  900     970    2
chr3  400     600    3
chr2  100     700    4
gp2:
chr   start   end   id2
chr1  590     864   1
chr3  550     670   2
chr2  897     1987  3

I'm looking for a way to compare these 2 group and get results like this:

id1   id2    chr   overlapped_length
 1     1     chr1     10
 3     2     chr3     50   

Solution

  • Should point you in the right direction:

    Load libraries

    # install.packages("BiocManager")
    # BiocManager::install("GenomicRanges")
    library(GenomicRanges)
    library(IRanges)
    

    Generate data

    gp1 <- read.table(text = 
    "
    chr   start   end   id1 
    chr1  580     600    1
    chr1  900     970    2
    chr3  400     600    3
    chr2  100     700    4
    ", header = TRUE)
    
    gp2 <- read.table(text = 
                          "
    chr   start   end   id2
    chr1  590     864   1
    chr3  550     670   2
    chr2  897     1987  3
    ", header = TRUE) 
    

    Calculate ranges

    gr1 <- GenomicRanges::makeGRangesFromDataFrame(
        gp1, 
        seqnames.field = "chr", 
        start.field = "start", 
        end.field = "end"
    )
    gr2 <- GenomicRanges::makeGRangesFromDataFrame(
        gp2, 
        seqnames.field = "chr", 
        start.field = "start", 
        end.field = "end"
    )
    

    Calculate overlaps

    hits <- findOverlaps(gr1, gr2)
    p <- Pairs(gr1, gr2, hits = hits)
    i <- pintersect(p)
    

    Result

    > as.data.frame(i)
      seqnames start end width strand  hit
    1     chr1   590 600    11      * TRUE
    2     chr3   550 600    51      * TRUE