Search code examples
rdataframecompareintervals

For each value in a column, check if it belongs to any interval in another dataframe


Let's say I have a list of positions values :

> head(jap["POS"])
      POS
1  836924
2  922009
3 1036959
4 141607615
5 164000000 
6 118528028 
[...]

And a list of intervals :

> genes_of_interest
       MGAM        SI      TREH    SLC2A2  SLC2A5   SLC5A1  TAS1R3       LCT
1 141607613 164696686 118528026 170714137 9095166 32439248 1266660 136545420
2 141806547 164796284 118550359 170744539 9148537 32509016 1270694 136594754

I want to check for every position in the first dataframe, if it is inside any of the intervals in the second dataframe.

So in this case, I should have

FALSE FALSE FALSE TRUE FALSE TRUE

Since 141607615 belongs to first interval (MGAM) and 118528028 belongs to 3rd interval (TREH).

Do you have any idea how to do this ?

Thanks by advance.


Solution

  • We can use sapply to go through all columns in genes_of_interest and compare the position shown in jap with the intervals. Then wrap it with another apply to determine if any of the rows is TRUE. Or we can replace the outer apply with as.logical(rowSums()), the outputs for both functions are the same.

    Note the between function comes from the dplyr package.

    library(dplyr)
    
    apply(sapply(1:ncol(genes_of_interest), \(x) between(jap$POS, genes_of_interest[1, x], genes_of_interest[2, x])), 1, any)
    
    # or 
    
    as.logical(rowSums(sapply(1:ncol(genes_of_interest), \(x) between(jap$POS, genes_of_interest[1, x], genes_of_interest[2, x]))))
    

    Output

    [1] FALSE FALSE FALSE  TRUE FALSE  TRUE