I've been toying with comparing vectorised R code with non-vectorised R code, and noticed that functions appear to use more memory the first time they are run. Here is a reproducible example:
library(bench)
squares <- function(x)
{
y <- x
for(i in seq_along(x))
{
y[i] <- x[i]*x[i]
}
return(y)
}
x <- 1:100
bm <- mark(x^2, squares(x))
bm
The first time this is run, squares(x)
uses a lot more memory (mem_alloc
column) than x^2
:
> bm
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 x^2 0 558.1ns 1387977. 1.27KB 0 10000 0 7.21ms
2 squares(x) 12.4µs 14.2µs 64885. 4.15MB 0 10000 0 154.12ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
But if I run the code again, I get very different results:
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 x^2 0 490.1ns 1430864. 848B 0 10000 0 6.99ms
2 squares(x) 12.9µs 16.4µs 57321. 448B 5.73 9999 1 174.44ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
If I run the benchmark again, I get the same results as the second time.
If, when I first start R, I run the functions prior to the benchmark, I get the following:
> 1^2
[1] 1
> squares(1)
[1] 1
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 x^2 0 977.1ns 993503. 1.27KB 0 10000 0 10.1ms
2 squares(x) 12.8µs 14.5µs 63713. 448B 0 10000 0 157ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
Note that the memory usage for squares(x)
is as low as in the second run, but not the usage for x^2
. If instead I run x^2
prior to the first benchmark, the memory used for x^2
drops to 848B
.
Is this because the memory used for R's just-in-time compilation is included in the memory profiling the first time the function is run? If so, why is x^2
affected - isn't the ^
operator already compiled to bytecode? Have I misunderstood what the memory profiling in R does? Or is something else going on here?
Following Roland's comment (thanks!), I tried turning off the JIT compilation. The results indicate that the additional memory usage the first time the function is run indeed is due to compilation:
library(compiler)
library(bench)
enableJIT(0) # Turn of JIT compilation
squares <- function(x)
{
y <- x
for(i in seq_along(x))
{
y[i] <- x[i]*x[i]
}
return(y)
}
With the following results:
> x <- 1:100
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 x^2 0 490.1ns 1205874. 1.27KB 0 10000 0 8.29ms
2 squares(x) 81.3µs 96.9µs 10428. 448B 62.7 4488 27 430.39ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
>
> # A second run:
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 x^2 0 559.1ns 1311094. 848B 0 10000 0 7.63ms
2 squares(x) 79.3µs 87.7µs 10749. 448B 80.0 4571 34 425.25ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
(As expected, the non-compiled function is slower, although that wasn't the point of the experiment.)
In addition, explicit compilation using cmpfun
also removes the excess memory usage in the first run:
library(compiler)
squares <- cmpfun(squares)
yields
> x <- 1:100
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 x^2 1ns 560ns 1163023. 1.27KB 0 10000 0 8.6ms
2 squares(x) 11.1µs 13.5µs 71576. 448B 0 10000 0 139.7ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
for the first run.