In which cases is it most efficent to write a part of R code in C++ or parallelise it (in R or Rcpp)?

I have some data with a special structure that requires me to write my own k-means function. Not far into this, I have already noticed the extremely high computation time when calculating the distance of a center to all data points. Because my data is going to be about 60 times larger in the future and I'll need to do runs with many different cluster sizes, I am very concerned with speed.

I have a attached a minimal example of calculating the distance from one randomly sampled center to each data point. I am not yet experienced using C++ or parallel computation in R, but I am most unsure about which of these solutions is the best approach to my problem (here and there some people claim you should parallelize whenever, some people claim it is almost never necessary, some advice for, some advice against using Rcpp). As with most things in life I'm sure there are cases where all of these answers are correct. However, what are the general circumstances when to go with which approach?

(I have profiled this code and couldn't find anything I could improve just within the R code for speed. If you have any suggestions however, please let me know as well!)

x <- matrix(runif(15000*34),nrow = 15000, ncol = 34)
w <- matrix(runif(15000*17),nrow = 15000, ncol = 17)
k <- 3
i <- 1
centers <- x[sample.int(nrow(x), size = k),]

weighted_matching <- function(point,center,weight){
    point <- matrix(point, ncol = 2, nrow = 17, byrow = T)
    center <- matrix(center, ncol = 2, nrow = 17, byrow = T)
    1/sum(weight) * sum(weight * apply(point, 1, function(x,y) sqrt(sum((x-y)^2)), y = center))
}

system.time(
    apply(x, 1, weighted_matching, weight = w, center = centers[i,])
)

Solution

There are two cases I use C++ in replacement to R:

iterate over lots of elements (large for-loops)
want to reduce memory-footprint

In your case, you are already using vectorized code instead of loop, so the first point does not apply.

The second point however could be beneficial; indeed, you're computing (x-y)^2 which creates two new temporary vectors. It would be beneficial to rewrite this in C++ to use less memory and maybe getting a 2-3 fold improvement in computation time.

But, when I usually hear about "computing distances", I would probably go for trying to derive this using matrix computations (linear algebra).