I have a sorted array of values and a single value like so:
x <- c(1.0, 3.45, 5.23, 7.3, 12.5, 23.45)
v <- 6.45
I can find the index of the value after which v
would be inserted into x
while maintaining the sorting order:
max(which(x <= v))
[1] 3
It is nice and compact code, but I have the gut feeling that behind-the-scenes this is really inefficient: since which()
does not know that the array is sorted it has to inspect all values.
Is there a better way of finding this index value?
Note: I am not interested in actually merging v
into x
. I just want the index value.
You can use findInterval which makes use of a binary search.
findInterval(v, x)
#[1] 3
Or using C++ upper_bound
with Rcpp
.
Rcpp::cppFunction(
"int upper_bound(double v, const Rcpp::NumericVector& x) {
return std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v));
}")
upper_bound(v, x)
#[1] 3
Or in case you have also a vector of positions like in findInterval
.
Rcpp::cppFunction("
Rcpp::IntegerVector upper_bound2(const Rcpp::NumericVector& v
, const Rcpp::NumericVector& x) {
Rcpp::IntegerVector res(v.length());
for(int i=0; i < res.length(); ++i) {
res[i] = std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v[i]));
}
return res;
}")
v <- c(3, 6.45)
upper_bound2(v, x)
#[1] 1 3
findInterval(v, x)
#[1] 1 3