Search code examples
roptimizationvectorbigintegergmp

R: Find length of bigz vector without using the length function


In R, to find the length of a vector (bigz or not), one typically uses the length function. E.g.

NonBigZ <- 1:10

NonBigZ
[1]  1  2  3  4  5  6  7  8  9 10

length(NonBigZ)
[1] 10

However, using the gmp package, if you declare a bigz vector, the length of the vector is returned automatically. E.g.

BigZ <- as.bigz(1:10)

BigZ
Big Integer ('bigz') object of length 10:  ## <<-- length given here
 [1] 1  2  3  4  5  6  7  8  9  10

## This seems redundant as it is already given above
length(BigZ)
[1] 10

I would like to retrieve that information without making the extra call to length. I know length is lightning fast, but it could save a pretty decent chunk of time if you could avoid calling it. Observe:

system.time(sapply(1:10^6, function(x) length(BigZ)))
user  system elapsed 
7.81    0.00    7.84

I have tried attributes(BigZ) as well as str(BigZ) to no avail. I have read the gmp documentation as well, but couldn't find anything.


Solution

  • As @alexis_laz pointed out in the comments, gmp::print.bigz already calculates the length but doesn't return it in any usable format. I did some digging into the gmp source code and found this:

    print.bigz <- function(x, quote = FALSE, initLine = is.null(modulus(x)), ...)
    {
      if((n <- length(x)) > 0) {
        if(initLine) {
          cat("Big Integer ('bigz') ")
          kind <- if(isM <- !is.null(nr <- attr(x, "nrow")))
            sprintf("%d x %d matrix", nr, n/nr)
          else if(n > 1) sprintf("object of length %d", n) else ""
          cat(kind,":\n", sep="")
        }
        print(as.character(x), quote = quote, ...)
      }
      else
        cat("bigz(0)\n")
      invisible(x)
    }
    

    As you can see, it uses the cat function to return your bigz object. From this question and this answer, it is possible to retrieve the requested information, however, it isn't nearly as efficient as simply calling length. Below is a very crude function for obtaining the length.

    BigZLength <- function(x) {
        b <- capture.output(x)
        a <- strsplit(b[1], split=" ")[[1]][7]
        if (!is.na(a)) {as.integer(substr(a,1,nchar(a)-1))} else {1L}
    }
    
    system.time(sapply(1:10^5, function(x) length(BigZ)))
     user  system elapsed 
    0.67    0.00    0.67 
    
    system.time(sapply(1:10^5, function(x) BigZLength(BigZ)))
     user  system elapsed 
    24.57    0.01   24.71
    

    I'm sure you could write a more efficient function using regular expressions (or something else), however, I don't believe it will be as efficient as simply calling length. In fact, simply getting the output of cat takes most of the time in the above code.

    system.time(sapply(1:10^5, function(x) capture.output(BigZ)))
     user  system elapsed 
    20.00    0.00   20.03
    



    A note about fetching the source code above

    If you are familiar with R you know that you can view the source code of a given function by simply typing the function in the console and printing it like so:

    numbers::nextPrime
    function (n) 
    {
        if (n <= 1) 
            n <- 1
        else n <- floor(n)
        n <- n + 1
        d1 <- max(3, round(log(n)))
        P <- Primes(n, n + d1)
        while (length(P) == 0) {
            n <- n + d1 + 1
            P <- Primes(n, n + d1)
        }
        return(as.numeric(min(P)))
    }
    <environment: namespace:numbers>
    

    However, sometimes this is not possible. For example with gmp::print.bigz we obtain:

    gmp::print.bigz
    Error: 'print.bigz' is not an exported object from 'namespace:gmp'
    

    Enter Joshua Ulrich’s awesome question and answer. Using the code he suggests below, you can download the source code of any package and unpack it in one line.

    untar(download.packages(pkgs = "gmp",
                            destdir = ".",
                            type = "source")[,2])
    

    This creates a folder in your directory with all of the compiled code. The above source code was found in the .\gmp\R\biginteger.R file.