I am wondering what the most memory efficient way to initialize a list is in R if that list is going to be used in a loop to store results. I know that growing an object in a loop can cause a serious hit in computational efficiency so I am trying to avoid that as much as possible.
My problem is as follows. I have several groups of data that I want to process individually. The gist of my code is I have a loop that runs through each group one at a time, does some t-tests, and then returns only the statistically significant results (thus variable length results for each group). So far I am initializing a list of length(groups)
to store the results of each iteration.
My main question is how I should be initializing the list so that the object is not grown in the loop.
list = vector(mode = "list", length=length(groups))
for the initialization?
length(groups)
but each entry is equal to NULL
. My concern is that during each iteration of the loop when I go to store data into the list, it is going to recopy the object each time as the entry goes from NULL
to my results vector, in which case initializing the list doesn't really do much good. I don't know how the internals of a list
work, however, so it is possible that it just stores the reference to the vector being stored in the list, meaning recopying is not necessary.Below is some psuedo code describing my problem
#initialize variables
results = vector(mode="list", length=length(groups)) #the line of code in question
y=1
tTests = vector(length = length(singleGroup))
#perform analysis on each group in groups
for(group in groups)
{
#returns a vector of p values with one entry per element in group
tTests = tTestFunction(group)
results[[y]] = tTests<=0.05
y=y+1
}
Your code does not work, so it is a bad example. Consider this:
x <- vector("list", length = 4)
tracemem(x) ## trace memory copies of "x"
for (i in 1:4) x[[i]] <- rnorm(4)
No extra copy of x
is made during update. So there is nothing to worry.
As suggested by @lmo, even if you use x <- list()
to initialize this list, no memory copy will be incurred, either.
Comment
The aim of my answer, is to refer you to the use of tracemem
, when you want to trace (possible) memory copies made during code execution. Had you known this function, you would not ask us here.
Here is my other answer made, related to using tracemem
. It is in a different context, though. There, you can see what tracemem
would return when memory copies are made.