Is there an equivalent of substring
for raw
vectors in R
?
Say that I have a large binary raw vector x
, e.g. as a result from reading a file using readBin
. Now I used grepRaw
to find the index of some fragment inside the raw vector that I would like to access. A toy example:
x <- charToRaw("foobar");
n <- 2;
m <- 5;
Now I would like to extract the "substring" from positions 2 and 5. A native way to do so is:
x[n:m]
However, this scales poorly for large fragments, because R first creates a large vector n:m
and then iterates over this vector to extract the elements from x
at these indices, one by one. Is there a more native method to extract a part of a raw
vector, similar to substr
for character vectors? I don't think I can use rawToChar
because the files might contain non-text binary data.
This is a C implementation
library(inline)
subraw <- cfunction(c(x="raw", i="integer", j="integer"), "
int n = INTEGER(j)[0] - INTEGER(i)[0] + 1;
SEXP result;
if (n < 0)
Rf_error(\"j < i - 1\");
result = Rf_allocVector(RAWSXP, n);
memcpy(RAW(result), RAW(x) + INTEGER(i)[0] - 1L, n);
return result;
")
with the usual caveats about missing sanity checks (e.g., i, j scalar and not NA, i > 0; j <= length(x), etc.). In action
> xx = readBin("~/bin/R-devel/lib/libR.so", raw(), 6000000)
> length(xx)
[1] 5706046
> length(subraw(xx, 1L, length(xx)))
[1] 5706046
> system.time(subraw(xx, 1L, length(xx)))
user system elapsed
0.000 0.000 0.001
subraw(xx, 10L, 9L)
returns raw(0)
.