Search code examples
roverlapgenome

Equal genomic intervals between samples


I would like to found the exactly same genomic intervals shared between samples (NE_id).

My Input:

chr  start_call   end_call  NE_id 
chr1    150         200      NE01
chr1    150         200      NE02
chr2    100         150      NE01
chr2    100         160      NE02
chr3    200         300      NE01   
chr3    200         300      NE02

My expected output:

chr  start_call   end_call  NE_id 
chr1    150         200      NE01, NE02   
chr3    200         300      NE01, NE02

In this example the chr2 genomic interval have some overlap, however it don´t correspond to the exact same genomic interval (size difference == 10).

Thank you very much.


Solution

  • If dat is the data, you could try:

    res <-aggregate(NE_id~., data=dat, FUN=I)
    res[sapply(res$NE_id,length)>1,]
    #    chr  start_call end_call     NE_id
    # 3 chr1        150      200 NE01, NE02
    # 4 chr3        200      300 NE01, NE02