Search code examples
rmemorymemory-managementmemory-leaksgarbage-collection

Does the R garbage collection look ahead at code?


I know that in compiled languages like C++, the compiler is smart enough to work out when a variable will no longer be required in the future and can clear up such variables.

Does R do the same, ie. can it tell a variable isn't used at all in the rest of a function and decide to remove it? (I mean when gc() is next called, I know R doesn't call gc() until it needs more memory.) Or if we have a high-memory-using function with large arrays that may need to be run on low-spec PCs, and we know a function is finished with such an array, would it be best to use rm() to remove that array, so that R knows it can clear that memory next time it calls gc()?

Or is there a better way to use scopes with curly braces to reduce memory usage in R?


Solution

  • tl;dr: The GC does not look ahead; objects that are still bound to names anywhere will not be collected by the GC.


    The R GC (and GCs in all other languages that I know of) does not look at the code at all. Instead, GCs keep track of memory references (via different mechanisms) and free memory when it is no longer referenced.

    In addition, C++ also does not clean up variables preemptively; in fact, this is an important part of its formal semantics, and a lot of code relies on this behaviour, e.g. std::scoped_lock. Instead, both R and C++ have a concept of variable scope: when a name goes out of scope (i.e. the execution reaches the end of the scope in which the variable was declared), it gets removed.

    In the case of R, all this means is that you can no longer reference that variable (though there are escape hatches via closures) and that the value behind the variable is no longer referenced by that variable (but potentially from elsewhere). Once a value is no longer referenced from anywhere, its memory can be reclaimed by the GC when it runs.

    C++ is more complicated, but in a nutshell: if the variable is an lvalue then the lifetime of the value it refers to ends. And when the lifetime of a value ends in C++, its destructor is run and its memory is freed.

    That said, in a hypothetical language it would be possible for the GC to look at the code, look ahead, and preemptively free memory. Rust does something similar to determine if a reference is still used (but it does not use the information to free memory, merely to satisfy the borrow checker).