Search code examples
rrcpparmadillo

Extract every k-th column of arma::mat matrix in rcpp


I was struggling with subsetting columns of a matrix of class arma::mat.

Let's say arma::mat X is given, and I tried to create a vector of indices IDX, in order to do X.cols(IDX). Especially, the index vector has every k-th integer from 1 to p (dimension of X). For example, one may be interested in every even columns IDX=[2,4,6,8, ...].

Based on this documentation, contiguous indices such as [0, 1, 2, ..., m-1] can be extracted easily using X.cols(0, m - 1) if m <= p. However, I couldn't find a good way to subset a matrix with the index vector IDX described above.

I wonder how I complete this code to give a desired output.

My "subset_armamat.cpp" file looks like

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
mat subset_armamat(mat X, int k){
  uvec IDX = "every k-th integer from 0 to X.ncols";
  return X.cols(IDX);
}

and R code to execute the defined function is

library("Rcpp")
sourceCpp("subset_armamat.cpp")
subset_armamat(matrix(1:10, 2, 5, byrow = T), 2)

This is expected to produce a 2-by-3 matrix as the following R code would give

> matrix(1:10, 2, 5, byrow = T)[,seq(1, 5, by = 2)]
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    6    8   10

It would be very appreciated if you give any input.

p.s. I've tried to

  • generate a sequence vector seq(1,m) * 2 manually, but this does not work with X.cols().
  • or find an index using find(seq(1,p) % 2 == 0), but % operator does not work well between seq(1,p) and 2.

Solution

  • F. Privé's answer showed that you can in fact use a uvec to subset a matrix using .cols() even if its not a contiguous range, using the base R seq() function to generate the sequence. I will further demonstrate that you can generate the sequence using an Armadillo function; you can use arma::regspace() -- it "generate[s] a vector with regularly spaced elements" (Armadillo documentation source):

    // [[Rcpp::depends(RcppArmadillo)]]
    #include <RcppArmadillo.h>
    using namespace Rcpp;
    using namespace arma;
    
    // [[Rcpp::export]]
    mat subset_armamat(mat X, int k) {
        uvec IDX = regspace<uvec>(0,  k,  X.n_cols-1);
        return X.cols(IDX);
    }
    

    As a comparison to calling R's seq() (where subset_armamatR() is the function from F. Privé's answer):

    library("Rcpp")
    sourceCpp("subset_armamat.cpp")
    mat <- matrix(1:10, 2, 5, byrow = TRUE)
    subset_armamat(mat, 2)
    #>      [,1] [,2] [,3]
    #> [1,]    1    3    5
    #> [2,]    6    8   10
    subset_armamatR(mat, 2)
    #>      [,1] [,2] [,3]
    #> [1,]    1    3    5
    #> [2,]    6    8   10
    library(microbenchmark)
    microbenchmark(Rseq = subset_armamatR(mat, 2),
                   regspace = subset_armamat(mat, 2))
    #> Unit: microseconds
    #>      expr     min       lq     mean   median       uq       max neval cld
    #>      Rseq 235.535 239.1615 291.1954 241.9850 248.6005  4704.467   100   a
    #>  regspace  14.221  15.0225 520.9235  15.8165  16.6740 50408.375   100   a
    

    Update: Passing by reference

    A comment from hbrerkere warrants some brief additional discussion. If you are calling this function from C++, you'll gain speed by changing mat subset_armamat(mat X, int k) to mat subset_armamat(const mat& X, int k). Passing by reference like this avoids an unnecessary copy, and when you do not intend to change an object passed by reference, you should use const. However, if you are calling this function from R, you cannot avoid a copy as arma::mat is not a native R type (see, for example, this answer by Dirk Eddelbuettel (the maintainer of both Rcpp and RcppArmadillo). Consider the following example:

    // [[Rcpp::depends(RcppArmadillo)]]
    #include <RcppArmadillo.h>
    
    // [[Rcpp::export]]
    void reference_example(arma::mat& X) {
        X(0, 0) = 42;
    }
    
    // [[Rcpp::export]]
    void print_reference_example(arma::mat X) {
        reference_example(X);
        Rcpp::Rcout << X << "\n";
    }
    

    Then calling from R:

    library("Rcpp")
    sourceCpp("reference_example.cpp")
    mat <- matrix(1:4, 2, 2)
    mat
    #>      [,1] [,2]
    #> [1,]    1    3
    #> [2,]    2    4
    reference_example(mat)
    mat
    #>      [,1] [,2]
    #> [1,]    1    3
    #> [2,]    2    4
    print_reference_example(mat)
    #>    42.0000    3.0000
    #>     2.0000    4.0000
    mat
    #>      [,1] [,2]
    #> [1,]    1    3
    #> [2,]    2    4