I was struggling with subsetting columns of a matrix of class arma::mat.
Let's say arma::mat X
is given, and I tried to create a vector of indices IDX
, in order to do X.cols(IDX)
. Especially, the index vector has every k-th integer from 1 to p (dimension of X
). For example, one may be interested in every even columns IDX=[2,4,6,8, ...]
.
Based on this documentation, contiguous indices such as [0, 1, 2, ..., m-1] can be extracted easily using X.cols(0, m - 1)
if m <= p. However, I couldn't find a good way to subset a matrix with the index vector IDX
described above.
I wonder how I complete this code to give a desired output.
My "subset_armamat.cpp" file looks like
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
mat subset_armamat(mat X, int k){
uvec IDX = "every k-th integer from 0 to X.ncols";
return X.cols(IDX);
}
and R code to execute the defined function is
library("Rcpp")
sourceCpp("subset_armamat.cpp")
subset_armamat(matrix(1:10, 2, 5, byrow = T), 2)
This is expected to produce a 2-by-3 matrix as the following R code would give
> matrix(1:10, 2, 5, byrow = T)[,seq(1, 5, by = 2)]
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 6 8 10
It would be very appreciated if you give any input.
p.s. I've tried to
seq(1,m) * 2
manually, but this does not work with X.cols()
. find(seq(1,p) % 2 == 0)
, but %
operator does not work well between seq(1,p)
and 2
. F. Privé's answer showed that you can in fact use a uvec
to subset a matrix using .cols()
even if its not a contiguous range, using the base R seq()
function to generate the sequence. I will further demonstrate that you can generate the sequence using an Armadillo function; you can use arma::regspace()
-- it "generate[s] a vector with regularly spaced elements" (Armadillo documentation source):
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
mat subset_armamat(mat X, int k) {
uvec IDX = regspace<uvec>(0, k, X.n_cols-1);
return X.cols(IDX);
}
As a comparison to calling R's seq()
(where subset_armamatR()
is the function from F. Privé's answer):
library("Rcpp")
sourceCpp("subset_armamat.cpp")
mat <- matrix(1:10, 2, 5, byrow = TRUE)
subset_armamat(mat, 2)
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 6 8 10
subset_armamatR(mat, 2)
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 6 8 10
library(microbenchmark)
microbenchmark(Rseq = subset_armamatR(mat, 2),
regspace = subset_armamat(mat, 2))
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> Rseq 235.535 239.1615 291.1954 241.9850 248.6005 4704.467 100 a
#> regspace 14.221 15.0225 520.9235 15.8165 16.6740 50408.375 100 a
A comment from hbrerkere warrants some brief additional discussion. If you are calling this function from C++, you'll gain speed by changing mat subset_armamat(mat X, int k)
to mat subset_armamat(const mat& X, int k)
. Passing by reference like this avoids an unnecessary copy, and when you do not intend to change an object passed by reference, you should use const
. However, if you are calling this function from R, you cannot avoid a copy as arma::mat
is not a native R type (see, for example, this answer by Dirk Eddelbuettel (the maintainer of both Rcpp
and RcppArmadillo
). Consider the following example:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
void reference_example(arma::mat& X) {
X(0, 0) = 42;
}
// [[Rcpp::export]]
void print_reference_example(arma::mat X) {
reference_example(X);
Rcpp::Rcout << X << "\n";
}
Then calling from R:
library("Rcpp")
sourceCpp("reference_example.cpp")
mat <- matrix(1:4, 2, 2)
mat
#> [,1] [,2]
#> [1,] 1 3
#> [2,] 2 4
reference_example(mat)
mat
#> [,1] [,2]
#> [1,] 1 3
#> [2,] 2 4
print_reference_example(mat)
#> 42.0000 3.0000
#> 2.0000 4.0000
mat
#> [,1] [,2]
#> [1,] 1 3
#> [2,] 2 4