Search code examples
rsubsetdata-cleaning

Is there a fast/clever way to return a logical vector if elements of a vector are in at least one interval?


Assume you have vector numeric vector x and a data frame df with columns start and stop. Is there a clever way to return a logical vector with length equal to x indicating if x is in at least one interval defined by start or stop ?

The actual case I'm working with has length(x) >> nrow(df). The naïve way to do this would be using a for loop but I was hoping for something more elegant and that runs fast.

 x <- 1:10
df <- data.frame(start = c(0, 4.5, 6), stop = c(1, 5.5, 8.5))
z <- rep(FALSE, length(x))

for(i in 1:nrow(df)){
  z <- z | (df$start[i] <= x & x <= df$stop[i])
}

x[z] # 1 5 6 7 8

Solution

  • Maybe you can use outer like below

    > with(df, x[rowMeans(outer(x, start, `>=`) & outer(x, stop, `<=`)) > 0])
    [1] 1 5 6 7 8
    

    or, you can use %inrange% from data.table

    > library(data.table)
    
    > x[x %inrange% df]
    [1] 1 5 6 7 8