Search code examples
rbisection

How can I efficiently find the index of a value in a sorted array?


I have a sorted array of values and a single value like so:

x <- c(1.0, 3.45, 5.23, 7.3, 12.5, 23.45)
v <- 6.45

I can find the index of the value after which v would be inserted into x while maintaining the sorting order:

max(which(x <= v))
[1] 3

It is nice and compact code, but I have the gut feeling that behind-the-scenes this is really inefficient: since which() does not know that the array is sorted it has to inspect all values.

Is there a better way of finding this index value?

Note: I am not interested in actually merging v into x. I just want the index value.


Solution

  • You can use findInterval which makes use of a binary search.

    findInterval(v, x)
    #[1] 3
    

    Or using C++ upper_bound with Rcpp.

    Rcpp::cppFunction(
      "int upper_bound(double v, const Rcpp::NumericVector& x) {
         return std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v));
    }")
    
    upper_bound(v, x)
    #[1] 3
    

    Or in case you have also a vector of positions like in findInterval.

    Rcpp::cppFunction("
    Rcpp::IntegerVector upper_bound2(const Rcpp::NumericVector& v
                                   , const Rcpp::NumericVector& x) {
      Rcpp::IntegerVector res(v.length());
      for(int i=0; i < res.length(); ++i) {
        res[i] = std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v[i]));
      }
      return res;
    }")
    
    v <- c(3, 6.45)
    upper_bound2(v, x)
    #[1] 1 3
    findInterval(v, x)
    #[1] 1 3