Search code examples
rperformancevectorsubsetrcpp

Erasing zeros from the vector element in Rcpp


I wrote the following code to erase zeros from the vector. I use the erase(i) function from the Rcpp library.

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector erase_zero(NumericVector x) {
  for (int i = 0; i < x.size(); i++) {
    if (x[i] == 0) {
      x.erase(i);
    }
  }
  return x;
}

Everything is fine, now the problem is the output of the function, i.e.

> erase_zero(c(0,1,2,3,0))
[1] 1 2 3
> erase_zero(c(0,0,1,2,3,0,0))
[1] 0 1 2 3 0
> erase_zero(c(0,0,0,1,2,3,0,0,0))
[1] 0 1 2 3 0
> erase_zero(c(0,0,0,0,1,2,3,0,0,0,0))
[1] 0 0 1 2 3 0 0

I don't know why this is happening.

after reading all the answers below, I simply tried the speed test

> microbenchmark(erase_zero(s), erase_zero1(s), erase_zero_sugar(s))
Unit: microseconds
                expr    min      lq     mean median      uq    max neval
       erase_zero(s) 19.311 21.2790 22.54262 22.181 22.8780 35.342   100
      erase_zero1(s) 18.573 21.0945 21.95222 21.771 22.4680 36.490   100
 erase_zero_sugar(s)  1.968  2.0910  2.57070  2.296  2.5215 24.887   100

erase_zero1 is Roland's first code. Also, ThomasIsCoding's R base is more efficient than all.


Solution

  • erase changes the size of the vector. This gives the expected output.

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    NumericVector erase_zero(NumericVector x) {
      R_xlen_t n = x.size();
      for (R_xlen_t i = 0; i < n; i++) {
        if (x[i] == 0) {
          x.erase(i);
          i--;
          n--;
        }
      }
      return x;
    }
    
    /*** R
    erase_zero(c(0,1,2,3,0))
    erase_zero(c(0,0,1,2,3,0,0))
    erase_zero(c(0,0,0,1,2,3,0,0,0))
    erase_zero(c(0,0,0,0,1,2,3,0,0,0,0))
    */
    

    However, you should just use some Rcpp sugar. It is more efficient:

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    NumericVector erase_zero_sugar(NumericVector x) {
      return x[x != 0];
    }
    

    You should also read Why are these numbers not equal.