According to the Memory{base} help page for R 4.1.0 Documentation, R keeps two separate memory areas for "fixed" and "variable" sized objects. As I understand, variable-sized objects are those the user can create in the work environment: vectors, lists, data frames, etc. However, when referring to fixed-sized objects the documentation is rather obscure:
[Fixed-sized objects are] allocated as an array of cons cells (Lisp programmers will know what they are, others may think of them as the building blocks of the language itself, parse trees, etc.)[.]
Could someone provide an example of a fixed-sized object that is stored in a cons cell? For further reference, I know the function memory.profile()
gives a profile of the usage of cons cells. For example, in my session this appears like:
> memory.profile()
NULL symbol pairlist closure environment promise language
1 23363 623630 9875 2619 13410 200666
special builtin char logical integer double complex
47 696 96915 16105 107138 10930 22
character ... any list expression bytecode externalptr
130101 2 0 50180 1 42219 3661
weakref raw S4
1131 1148 1132
What do these counts stand for, both numerically and conceptually? For instance, does the logical: 16105
make reference to 16,105 logical objects (bytes?, cells?) that are stored in the source code/binaries of R?
My purpose is to gain more understanding about how R manages memory in a given session. Finally, I think I do understand what a cons cell is, both in Lisp and R, but if the answer to this question needs to address this concept first I think it won't hurt starting from there maybe.
At C level, an R object is just a pointer to a block of memory called a "node". Each node is a C struct, either a SEXPREC
or a VECTOR_SEXPREC
. VECTOR_SEXPREC
is for vector-like objects, including strings, atomic vectors, expression vectors, and lists. SEXPREC
is for every other type of object.
The SEXPREC
struct has three contiguous segments:
The VECTOR_SEXPREC
struct has segments (1) and (2) above, followed by:
The VECTOR_SEXPREC
struct is followed by a block of memory spanning at least 8+n*sizeof(<type>)
bytes, where n
is the length of the corresponding vector. The block consists of an 8-byte leading buffer, the vector "data" (i.e., the vector's elements), and sometimes a trailing buffer.
In summary, non-vectors are stored as a node spanning 32 or 56 bytes, while vectors are stored as a node spanning 28 or 36 bytes followed by a block of data of size roughly proportional to the number of elements. Hence nodes are of roughly fixed size, while vector data require a variable amount of memory.
R allocates memory for nodes in blocks called Ncells (or cons cells) and memory for vector data in blocks called Vcells. According to ?Memory
, each Ncell is 28 bytes on 32-bit systems and 56 bytes on 64-bit systems, and each Vcell is 8 bytes. Thus, this line in ?Memory
:
R maintains separate areas for fixed and variable sized objects.
is actually referring to nodes and vector data, not R objects per se.
memory.profile
gives the number of cons cells used by all R objects in memory, stratified by object type. Hence sum(memory.profile())
will be roughly equal to gc(FALSE)[1L, "used"]
, which gives the total number of cons cells in use after a garbage collection.
gc(FALSE)
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 273996 14.7 667017 35.7 NA 414424 22.2
## Vcells 549777 4.2 8388608 64.0 16384 1824002 14.0
sum(memory.profile())
## [1] 273934
When you assign a new R object, the number of Ncells and Vcells in use as reported by gc
will increase. For example:
gc(FALSE)[, "used"]
## Ncells Vcells
## 273933 549662
x <- Reduce(function(x, y) call("+", x, y), lapply(letters, as.name))
x
## a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p +
## q + r + s + t + u + v + w + x + y + z
gc(FALSE)[, "used"]
## Ncells Vcells
## 330337 676631
You might be wondering why the number of Vcells in use increased, given that x
is a language object, not a vector. The reason is that nodes are recursive: they contain pointers to other nodes, which may very well be vector nodes. Here, Vcells were allocated in part because each symbol in x
points to a string (+
to "+"
, a
to "a"
, and so on), and each of those strings is a vector of characters. (That said, it is surprising that ~125000 Vcells were required in this case. That may be an artifact of the Reduce
and lapply
calls, but I'm not really sure at the moment.)
Everything is a bit scattered: