R Table1: In reality many more rows
PatientID chr pos type end length
AB1 1 2431 DEL 2100 -331
AC3 1 98041 INV 99100 1059
AG6 1 8743 BND 9000 257
Table 2: In reality many more rows, columns 2 and 3 are the ranges for the exon
Exon 2001 2500
Exon 8700 8750
Desired output: Those rows from table 1 where any part of the variants (from pos-end) fall within anywhere of the range of an Exon in table2
output:
PatientID chr pos type end length
AB1 1 2431 DEL 2100 -331
AG6 1 8743 BND 9000 257
I have tried a subset:
subset(table1$pos >= table2$V2 | table1$end <= table$end)
but this does not give me my desired output. Any help would be much appreciated.
All the best
If I understand you correctly, you want to find variants which (for at least one range in table2, r) meet at least one of the following conditions:
table1$end
is within rtable1$pos
is within rtable1$pos
and table2$end
are on opposite sides of rBelow are schematic depictions of all possibilities with r delimited by brackets and a variant represented by hyphens together with the numbers of conditions met.
-[- ] 1.
[ -]- 2.
[ - ] 1. and 2.
-[---]- 3.
Please note that the code below assumes that (unlike table1) table2[, 2]
<= table2[, 3]
.
# simplified table1
table1 <- data.frame(pos=c(2431, 98041, 8743), end=c(2100, 99100, 9000))
table2 <- data.frame(exon='Exon', from=c(2001, 8700), to=c(2500, 8750))
is.within <- apply(table1[, c('pos', 'end')], 1, function(x) {
x <- sort(x) # make sure x[1] <= x[2], i.e. pos <= end
any((x[1] >= table2[, 2] & x[1] <= table2[, 3]) | # 1.
(x[2] >= table2[, 2] & x[2] <= table2[, 3]) | # 2.
(x[1] <= table2[, 2] & x[2] >= table2[, 3])) # 3.
})
table1[is.within, ]