I would like to found the exactly same genomic intervals shared between samples (NE_id
).
My Input:
chr start_call end_call NE_id
chr1 150 200 NE01
chr1 150 200 NE02
chr2 100 150 NE01
chr2 100 160 NE02
chr3 200 300 NE01
chr3 200 300 NE02
My expected output:
chr start_call end_call NE_id
chr1 150 200 NE01, NE02
chr3 200 300 NE01, NE02
In this example the chr2
genomic interval have some overlap, however it don´t correspond to the exact same genomic interval (size difference == 10
).
Thank you very much.
If dat
is the data, you could try:
res <-aggregate(NE_id~., data=dat, FUN=I)
res[sapply(res$NE_id,length)>1,]
# chr start_call end_call NE_id
# 3 chr1 150 200 NE01, NE02
# 4 chr3 200 300 NE01, NE02