Search code examples
vectorrcpprcppparallel

Parallel Addition of Vectors using RcppParallel


I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.

// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
#include <assert.h> 
using namespace RcppParallel;
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector directVectorAddition(NumericVector first, NumericVector second) {
    assert (first.length() == second.length());
    NumericVector results(first.length());
    results = first + second;
    return results;
}

// [[Rcpp::export]]
NumericVector loopVectorAddition(NumericVector first, NumericVector second) {
    assert (first.length() == second.length());
    NumericVector results(first.length());
    for(unsigned i = 0; i != first.length(); i++)
        results[i] = first[i] + second[i];
    return results;
}

struct VectorAddition : public Worker
{
    const RVector<double> first, second;
    RVector<double> results;
    VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three) {}

    void operator()(std::size_t a1, std::size_t a2) {
        std::transform(first.begin() + a1, first.begin() + a2,
                       second.begin() + a1,
                       results.begin() + a1,
                       [](double i, double j) {return i + j;});
    }
};


// [[Rcpp::export]]
NumericVector parallelVectorAddition(NumericVector first, NumericVector second) {
    assert (first.length() == second.length());
    NumericVector results(first.length());
    VectorAddition myVectorAddition(first, second, results);
    parallelFor(0, first.length(), myVectorAddition);
    return results;
}

It seems to work, but doesn't speed up things (at least not on a 4-core machine).

> v1 <- 1:1000000
> v2 <- 1000000:1
> all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
[1] TRUE
> all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
[1] TRUE
> result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
 > result[,1:4]
                            test replications elapsed relative
    1                        v1 + v2          100   0.206    1.000
    4 parallelVectorAddition(v1, v2)          100   0.993    4.820
    2   directVectorAddition(v1, v2)          100   1.015    4.927
    3     loopVectorAddition(v1, v2)          100   1.056    5.126

Can this be implemented more efficiently?

Thanks a lot in advance,

mce


Solution

  • Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!

    Make it

    v1 <- as.double(1:1000000)
    v2 <- as.double(1000000:1)
    

    instead, and on a machine with lots of cores (at work) I then see

    R> result[,1:4]
                                test replications elapsed relative
    4 parallelVectorAddition(v1, v2)          100   0.301    1.000
    2   directVectorAddition(v1, v2)          100   0.424    1.409
    1                        v1 + v2          100   0.436    1.449
    3     loopVectorAddition(v1, v2)          100   0.736    2.445
    

    The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.

    But the good news is that you wrote your parallel code correctly. Not a small task.