Search code examples
c++rrcpprcpparmadillo

Extremely inefficient code written in C++


The computation time for the following function is very high. Is there any room for improvement? Should I be accessing the elements of matrix X differently? I appreciate any comments or suggestions.

#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]


arma::mat myfunc(const int& n,
                 const int& p,
                 arma::mat& X,
                 arma::rowvec& y,
                 const arma::rowvec& types,
                 const arma::mat& rat,
                 const arma::rowvec& betas){
  
  arma::mat final(p+p,n);
  final.zeros();
  int i,j;
  
  for(i=0; i < n; ++i){
    arma::colvec finalfirst(p+p); finalfirst.zeros();
    for(j=0; j < n; ++j){
      arma::mat Xt = X * log(y(j));
      arma::mat finalX = join_rows(X,Xt);
      
      arma::rowvec Xi = finalX.row(i);
      
      if(types(i)==1 && y(j)==y(i)){
        finalfirst += (Xi.t() - rat.col(j));
      }
      if(types(i)>1 && y(j) > y(i)){
        finalfirst -= (Xi.t() - rat.col(j)) * exp(arma::as_scalar(betas*Xi.t()));
        
      }
      else if(y(j) <= y(i)){
        finalfirst -= Xi.t() * exp(arma::as_scalar(betas*Xi.t()));
      }
    }
    
    final.col(i) = finalfirst;
  }
  
  return(final);
}




/*** R
m=4000
types = runif(m,0,5)
types[types<=1] = 0 ; types[types > 1 & types < 3] = 1; types[types>=3]=2
microbenchmark(out = myfunc(n=m,p=2,X=matrix(rnorm(m*2),nrow=m,ncol=2),y=runif(m,0,3),types=types,rat=matrix(rnorm(m*4),nrow=4,ncol=m),betas=c(1,2,3,4)))
*/

Solution

  • It might be helpful if you provide at least some explanation of what it is you are doing.

    First, I recommend breaking up the code into very small chunks, microbenchmarking each chunk, and finding bottleneck operations. This is better than throwing the whole function at the wall all at once.

    row vs. column access

    join_rows(X,Xt)

    This is a very slow operation, especially because you are adding a rowvec to a column-oriented mat. Armadillo matrices are stored as a vector in column-major order, so behind the scenes Armadillo is calling push_back in n non-contiguous locations, where n is the number of columns.

    It seems you may be able to avoid this altogether, as finalX.row(i) is the only call that depends on finalX in this loop. Just figure out what .row(i) is.

    implicit operations

    You do a lot of transposing, maybe you should be working with a transposed matrix from the get-go?

    Xi.t() is called twice in the same line: finalfirst -= (Xi.t() - rat.col(j)) * exp(arma::as_scalar(betas*Xi.t()));

    Transposing a row vector is pretty pointless, just initialize it as a column vector and iterate through it with a good old-fashioned loop. A little more code isn't always a bad thing, it often makes your intent more explicit too.

    Copying vs. in-place operations

    This is a copy: final.col(i) = finalfirst;

    Why not operate in-memory and update final(_, i) in-place rather than using a temporary finalfirst followed by a deep copy into your target matrix? This is a column, so memory is contiguous and the compiler will be able to optimize the access pattern for you as wonderfully as if you were working on any simple vector.

    All said, I haven't fully wrapped my head around what it is exactly that you are doing, but @largest_prime_is_463035818 may be right about switching the for(i...) and for(j...) loops around. It appears you will be able to pull these two rows out of the nested loop:

    arma::mat Xt = X * log(y(j));
    arma::mat finalX = join_rows(X,Xt);
    

    since they do not depend on i at all.