Why do R functions use more memory the first time they are run?

I've been toying with comparing vectorised R code with non-vectorised R code, and noticed that functions appear to use more memory the first time they are run. Here is a reproducible example:

library(bench)

squares <- function(x)
{
        y <- x
        for(i in seq_along(x))
        {
            y[i] <- x[i]*x[i]
        }
        return(y)
}

x <- 1:100
bm <- mark(x^2, squares(x))
bm

The first time this is run, squares(x) uses a lot more memory (mem_alloc column) than x^2:

> bm
# A tibble: 2 x 13
  expression    min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 x^2             0 558.1ns  1387977.    1.27KB        0 10000     0     7.21ms
2 squares(x) 12.4µs  14.2µs    64885.    4.15MB        0 10000     0   154.12ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

But if I run the code again, I get very different results:

> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
  expression    min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 x^2             0 490.1ns  1430864.      848B     0    10000     0     6.99ms
2 squares(x) 12.9µs  16.4µs    57321.      448B     5.73  9999     1   174.44ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

If I run the benchmark again, I get the same results as the second time.

If, when I first start R, I run the functions prior to the benchmark, I get the following:

> 1^2
[1] 1
> squares(1)
[1] 1
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
  expression    min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 x^2             0 977.1ns   993503.    1.27KB        0 10000     0     10.1ms
2 squares(x) 12.8µs  14.5µs    63713.      448B        0 10000     0      157ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Note that the memory usage for squares(x) is as low as in the second run, but not the usage for x^2. If instead I run x^2 prior to the first benchmark, the memory used for x^2 drops to 848B.

Is this because the memory used for R's just-in-time compilation is included in the memory profiling the first time the function is run? If so, why is x^2 affected - isn't the ^ operator already compiled to bytecode? Have I misunderstood what the memory profiling in R does? Or is something else going on here?

Solution

Following Roland's comment (thanks!), I tried turning off the JIT compilation. The results indicate that the additional memory usage the first time the function is run indeed is due to compilation:

library(compiler)
library(bench)
enableJIT(0) # Turn of JIT compilation

squares <- function(x)
{
        y <- x
        for(i in seq_along(x))
        {
            y[i] <- x[i]*x[i]
        }
        return(y)
}

With the following results:

> x <- 1:100
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
  expression    min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 x^2             0 490.1ns  1205874.    1.27KB      0   10000     0     8.29ms
2 squares(x) 81.3µs  96.9µs    10428.      448B     62.7  4488    27   430.39ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
> 
> # A second run:
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
  expression    min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 x^2             0 559.1ns  1311094.      848B      0   10000     0     7.63ms
2 squares(x) 79.3µs  87.7µs    10749.      448B     80.0  4571    34   425.25ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

(As expected, the non-compiled function is slower, although that wasn't the point of the experiment.)

In addition, explicit compilation using cmpfun also removes the excess memory usage in the first run:

library(compiler)
squares <- cmpfun(squares)

yields

> x <- 1:100
> bm <- mark(x^2, squares(x))
> bm
# A tibble: 2 x 13
  expression    min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 x^2           1ns   560ns  1163023.    1.27KB        0 10000     0      8.6ms
2 squares(x) 11.1µs  13.5µs    71576.      448B        0 10000     0    139.7ms
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

for the first run.