Search code examples
rlistfor-loopinitializationmemory-efficient

What is the most memory efficient way to initialize a list before a loop in R?


I am wondering what the most memory efficient way to initialize a list is in R if that list is going to be used in a loop to store results. I know that growing an object in a loop can cause a serious hit in computational efficiency so I am trying to avoid that as much as possible.

My problem is as follows. I have several groups of data that I want to process individually. The gist of my code is I have a loop that runs through each group one at a time, does some t-tests, and then returns only the statistically significant results (thus variable length results for each group). So far I am initializing a list of length(groups) to store the results of each iteration.

My main question is how I should be initializing the list so that the object is not grown in the loop.

  • Is it good enough to do list = vector(mode = "list", length=length(groups)) for the initialization?
    • I am skeptical about this because it just creates a list of length(groups) but each entry is equal to NULL. My concern is that during each iteration of the loop when I go to store data into the list, it is going to recopy the object each time as the entry goes from NULL to my results vector, in which case initializing the list doesn't really do much good. I don't know how the internals of a list work, however, so it is possible that it just stores the reference to the vector being stored in the list, meaning recopying is not necessary.
  • The other option would be to initialize each element of the list to a vector of the maximum possible length the results could have.
    • This is not a big issue as the maximum number of possible valid results is known. If I took this approach I would just overwrite each vector with the results vector within the loop. Since the maximum amount of memory would already be reserved hopefully no recopying/growth would occur. I don't want to take this approach, however, if it is not necessary and the first option above is good enough.

Below is some psuedo code describing my problem

#initialize variables
results = vector(mode="list", length=length(groups)) #the line of code in question
y=1
tTests = vector(length = length(singleGroup))    

#perform analysis on each group in groups
for(group in groups)
{
  #returns a vector of p values with one entry per element in group
  tTests = tTestFunction(group) 
  results[[y]] = tTests<=0.05
  y=y+1
}   

Solution

  • Your code does not work, so it is a bad example. Consider this:

    x <- vector("list", length = 4)
    tracemem(x)  ## trace memory copies of "x"
    for (i in 1:4) x[[i]] <- rnorm(4)
    

    No extra copy of x is made during update. So there is nothing to worry.

    As suggested by @lmo, even if you use x <- list() to initialize this list, no memory copy will be incurred, either.


    Comment

    The aim of my answer, is to refer you to the use of tracemem, when you want to trace (possible) memory copies made during code execution. Had you known this function, you would not ask us here.

    Here is my other answer made, related to using tracemem. It is in a different context, though. There, you can see what tracemem would return when memory copies are made.