r performance optimization microbenchmark

How to make use of sparse matrix to optimise evaluation on many shifted points

I'm currently trying to solve the following problem efficiently:

I have two vectors v1, v2 as well as a vectorised function f. For each x in v1 I'd like to compute the mean of f(x - v2). Special about this problem is that f will return zero on many inputs.

Example:

set.seed(0)

v1 <- rnorm(1000)
v2 <- rnorm(1000)

f <- function(x) {
  ret <- double(length(x)) + 1
  ret[abs(x) > 0.01] <- 0
  ret
}

solution_01 <- function(v1, v2, f) {
  ret <- numeric(length(v1))
  for (x in v2) {
    ret <- ret + f(v1 - x)
  }
  ret/length(v2)
}

solution_02 <- function(v1, v2, f) {
  apply(matrix(f(outer(v1, v2, `-`)), nrow=length(v1)), 1, sum)/length(v2)
}

solution_03 <- function(v1, v2, f) {
  rowSums(matrix(f(outer(v1, v2, `-`)), nrow=length(v1)))/length(v2)
}

solution_04 <- function(v1, v2, f) {
  rowMeans(matrix(f(outer(v1, v2, `-`)), nrow=length(v1)))
}

s1 <- solution_01(v1, v2, f)
s2 <- solution_02(v1, v2, f)
s3 <- solution_03(v1, v2, f)
s4 <- solution_04(v1, v2, f)

all.equal(s1, s2)
all.equal(s2, s3)
all.equal(s3, s4)

bench::mark(solution_01(v1, v2, f), solution_02(v1, v2, f), solution_03(v1, v2, f), solution_04(v1, v2, f))

# Sparsity
eval_points <- outer(v1, v2, `-`)
sum(f(eval_points) == 0)/length(eval_points)

As you can see I already implemented four possible solutions. For now the naive solution (using a for loop) is the fastest. I think this is because the other implementations rely on outer, which takes some time to allocate the required memory.

How can I optimise this code? Is there a way to make use of the sparsity of f(outer(v1, v_2))?

Solution

The problem you've presented (which I appreciate may be a simplification of what you are actually trying to do) would be efficiently solved by using sorted vectors, especially if the vectors got much longer than 1000 elements.

Since you are counting the values in v2 within a small distance of the values in v1 you could have an algorithm which advanced through the first vector until it got out of range of the element under consideration in the second, and then switch to advancing through the other vector. That way you only have to pass through each of the vectors once rather than length(v1) times.

As pointed out R isn't efficient at this sort of thing and you should code the whole thing in Rcpp if you want it to be really fast.

...

Thought I'd have a go at writing an algorithm for fun, turns out it is nearly 10 times faster on your example data even when written in R!

solution_05 <- function(v1, v2, f) {
  vs1 <- sort(v1)
  vs2 <- sort(v2)
  n1 <- integer(length(v1))
  i1 <- 1
  i2 <- 1
  l <- length(v1)
  for (x2 in vs2) {
    # advance i1 until v1 is in range below
    while (x2-vs1[i1] > 0.01 & i1 <= l) i1 <- i1+1
    if (i2 > i1) n1[i1:(i2-1)] <- n1[i1:(i2-1)]+1 else i2 <- i1
    # advance i2 until out of range adding 1 to n1[i2] each time
    while (vs1[i2]-x2 <= 0.01 & i2 <= l) {
      n1[i2] <- n1[i2]+1
      i2 <- i2+1
    }
  }
  s5 <- n1[rank(v1)]/length(v2)
  return(s5)
}