Search code examples
rcran

Equivalent of substr for raw vectors


Is there an equivalent of substring for raw vectors in R?

Say that I have a large binary raw vector x, e.g. as a result from reading a file using readBin. Now I used grepRaw to find the index of some fragment inside the raw vector that I would like to access. A toy example:

x <- charToRaw("foobar");
n <- 2;
m <- 5;

Now I would like to extract the "substring" from positions 2 and 5. A native way to do so is:

x[n:m]

However, this scales poorly for large fragments, because R first creates a large vector n:m and then iterates over this vector to extract the elements from x at these indices, one by one. Is there a more native method to extract a part of a raw vector, similar to substr for character vectors? I don't think I can use rawToChar because the files might contain non-text binary data.


Solution

  • This is a C implementation

    library(inline)
    subraw <- cfunction(c(x="raw", i="integer", j="integer"), "
        int n = INTEGER(j)[0] - INTEGER(i)[0] + 1;
        SEXP result;
        if (n < 0)
            Rf_error(\"j < i - 1\");
        result = Rf_allocVector(RAWSXP, n);
        memcpy(RAW(result), RAW(x) + INTEGER(i)[0] - 1L, n);
        return result;
    ")
    

    with the usual caveats about missing sanity checks (e.g., i, j scalar and not NA, i > 0; j <= length(x), etc.). In action

    > xx = readBin("~/bin/R-devel/lib/libR.so", raw(), 6000000)
    > length(xx)
    [1] 5706046
    > length(subraw(xx, 1L, length(xx)))
    [1] 5706046
    > system.time(subraw(xx, 1L, length(xx)))
       user  system elapsed 
      0.000   0.000   0.001 
    

    subraw(xx, 10L, 9L) returns raw(0).