I have an Rcpp function that should take an IntegerVector
as input (as toInt
). I want to use it on vector of integers, but also on vector of doubles that are just integers (e.g. 1:4
is of type integer
but 1:4 + 1
is of type double
).
Yet, when this is used on real floating point numbers (e.g. 1.5
), I would like it to return a warning or an error instead of silently rounding all values (to make them integers).
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector toInt(RObject x) {
return as<IntegerVector>(x);
}
> toInt(c(1.5, 2.4)) # I would like a warning
[1] 1 2
> toInt(1:2 + 1) # No need of warning
[1] 2 3
The first solution I thought of
// [[Rcpp::export]]
IntegerVector toInt2(const NumericVector& x) {
for (int i = 0; i < x.size(); i++) {
if (x[i] != (int)x[i]) {
warning("Uh-oh");
break;
}
}
return as<IntegerVector>(x);
}
but I wondered if there wasn't an unnecessary copy when x
was an IntegerVector
, so I made this other solution:
// [[Rcpp::export]]
IntegerVector toInt3(const RObject& x) {
NumericVector nv(x);
for (int i = 0; i < nv.size(); i++) {
if (nv[i] != (int)nv[i]) {
warning("Uh-oh");
break;
}
}
return as<IntegerVector>(x);
}
But, maybe the best solution would be to test if the RObject
is already of type int
and to fill the resulting vector at the same time of checking the type:
// [[Rcpp::export]]
SEXP toInt4(const RObject& x) {
if (TYPEOF(x) == INTSXP) return x;
NumericVector nv(x);
int i, n = nv.size();
IntegerVector res(n);
for (i = 0; i < n; i++) {
res[i] = nv[i];
if (nv[i] != res[i]) {
warning("Uh-oh");
break;
}
}
for (; i < n; i++) res[i] = nv[i];
return res;
}
Some benchmarking:
x <- seq_len(1e7)
x2 <- x; x2[1] <- 1.5
x3 <- x; x3[length(x3)] <- 1.5
microbenchmark::microbenchmark(
fprive(x), toInt2(x), toInt3(x), toInt4(x),
fprive(x2), toInt2(x2), toInt3(x2), toInt4(x2),
fprive(x3), toInt2(x3), toInt3(x3), toInt4(x3),
times = 20
)
Unit: microseconds
expr min lq mean median uq max neval
fprive(x) 229865.629 233539.952 236049.68870 235623.390 238500.4335 244608.276 20
toInt2(x) 98249.764 99520.233 102026.44305 100468.627 103480.8695 114144.022 20
toInt3(x) 50631.512 50838.560 52307.34400 51417.296 52524.0260 58311.909 20
toInt4(x) 1.165 6.955 46.63055 10.068 11.0755 766.022 20
fprive(x2) 63134.534 64026.846 66004.90820 65079.292 66674.4835 74907.065 20
toInt2(x2) 43073.288 43435.478 44068.28935 43990.455 44528.1800 45745.834 20
toInt3(x2) 42968.743 43461.838 44268.58785 43682.224 44235.6860 51906.093 20
toInt4(x2) 19379.401 19640.198 20091.04150 19918.388 20232.4565 21756.032 20
fprive(x3) 254034.049 256154.851 258329.10340 258676.363 259549.3530 264550.346 20
toInt2(x3) 77983.539 79162.807 79901.65230 79424.011 80030.3425 87906.977 20
toInt3(x3) 73521.565 74329.410 76050.63095 75128.253 75867.9620 88240.937 20
toInt4(x3) 22109.970 22529.713 23759.99890 23072.738 23688.5365 30905.478 20
So, toInt4
seems the best solution.