Search code examples
rrangesubset

Subset a table by whether two columns fall within any reference range in a lookup table in R


R Table1: In reality many more rows

PatientID chr pos  type end  length 
AB1       1  2431  DEL 2100  -331
AC3       1  98041 INV 99100  1059
AG6       1  8743  BND 9000   257

Table 2: In reality many more rows, columns 2 and 3 are the ranges for the exon

Exon 2001 2500 
Exon 8700 8750 

Desired output: Those rows from table 1 where any part of the variants (from pos-end) fall within anywhere of the range of an Exon in table2

output:

PatientID chr pos  type end   length 
AB1       1  2431  DEL 2100  -331
AG6       1  8743  BND 9000   257

I have tried a subset:

subset(table1$pos >= table2$V2 | table1$end <= table$end) 

but this does not give me my desired output. Any help would be much appreciated.

All the best


Solution

  • If I understand you correctly, you want to find variants which (for at least one range in table2, r) meet at least one of the following conditions:

    1. table1$end is within r
    2. table1$pos is within r
    3. table1$pos and table2$end are on opposite sides of r

    Below are schematic depictions of all possibilities with r delimited by brackets and a variant represented by hyphens together with the numbers of conditions met.

    -[-  ]   1.
     [  -]-  2.
     [ - ]   1. and 2.
    -[---]-  3.
    

    Please note that the code below assumes that (unlike table1) table2[, 2] <= table2[, 3].

    # simplified table1
    table1 <- data.frame(pos=c(2431, 98041, 8743), end=c(2100, 99100, 9000))
    table2 <- data.frame(exon='Exon', from=c(2001, 8700), to=c(2500, 8750))
    
    is.within <- apply(table1[, c('pos', 'end')], 1, function(x) {
      
      x <- sort(x)  # make sure x[1] <= x[2], i.e. pos <= end
      any((x[1] >= table2[, 2] & x[1] <= table2[, 3]) |      # 1.
            (x[2] >= table2[, 2] & x[2] <= table2[, 3]) |    # 2.
            (x[1] <= table2[, 2] & x[2] >= table2[, 3]))     # 3.
            
    })
    table1[is.within, ]