Let's say I have a list of positions values :
> head(jap["POS"])
POS
1 836924
2 922009
3 1036959
4 141607615
5 164000000
6 118528028
[...]
And a list of intervals :
> genes_of_interest
MGAM SI TREH SLC2A2 SLC2A5 SLC5A1 TAS1R3 LCT
1 141607613 164696686 118528026 170714137 9095166 32439248 1266660 136545420
2 141806547 164796284 118550359 170744539 9148537 32509016 1270694 136594754
I want to check for every position in the first dataframe, if it is inside any of the intervals in the second dataframe.
So in this case, I should have
FALSE FALSE FALSE TRUE FALSE TRUE
Since 141607615
belongs to first interval (MGAM
) and 118528028
belongs to 3rd interval (TREH
).
Do you have any idea how to do this ?
Thanks by advance.
We can use sapply
to go through all columns in genes_of_interest
and compare the position shown in jap
with the intervals. Then wrap it with another apply
to determine if any
of the rows is TRUE
. Or we can replace the outer apply
with as.logical(rowSums())
, the outputs for both functions are the same.
Note the between
function comes from the dplyr
package.
library(dplyr)
apply(sapply(1:ncol(genes_of_interest), \(x) between(jap$POS, genes_of_interest[1, x], genes_of_interest[2, x])), 1, any)
# or
as.logical(rowSums(sapply(1:ncol(genes_of_interest), \(x) between(jap$POS, genes_of_interest[1, x], genes_of_interest[2, x]))))
[1] FALSE FALSE FALSE TRUE FALSE TRUE