I am having trouble understanding the behave of mclapply (or maybe something else).
I do something like:
opt.Models = mclapply(1:100, mc.cores=20, function(i){
res = loadResult(reg, id=i)
return(post.Process(res))
})
loadResult
loads one result from a earlier saved BatchJob session. So, the res
object needs ~170MB (roughly all 100 objects are the same size +/-5MB).
When executing the piece of code, the memory footprint is as expected: 170MB*20= ~3.5GB (I used 20 cores).
When executing this piece of code for the second time, my machine inhales a vast amount of memory (more than available - so I stop execution). What is expected because, again, mclapply
forks the complete environment for each child and my environment now has the big opt.Models
variable with ~10GB. Therefore 10*20=200GB would be required.
When I remove opt.Models, rm(opt.Models)
, I still run into the same problem. mclapply consumes more memory than available (btw: 90GB).
So, which environment does mclapply fork, or isn't opt.Models totally gone? I can't see it using ls()
.
Maybe one of you has observed something similar.
Best regards,
Mario
You should call the gc
function after removing the variable so that the memory associated with the object is freed by the garbage collector sooner rather than later. The rm
function only removes the reference to the data, while the actual object may continue to exist until the garbage collector eventually runs.
You may also want to call gc
before the first mclapply
to make testing easier:
gc()
opt.Models = mclapply(1:100, mc.cores=20, function(i){
res = loadResult(reg, id=i)
return(post.Process(res))
})
# presumably do something with opt.Models...
rm(opt.Models)
gc() # free up memory before forking
opt.Models = mclapply(1:100, mc.cores=20, function(i){
res = loadResult(reg, id=i)
return(post.Process(res))
})