R, the environment of mclapply and removing variables

I am having trouble understanding the behave of mclapply (or maybe something else).

I do something like:

opt.Models = mclapply(1:100, mc.cores=20, function(i){
    res = loadResult(reg, id=i)    
    return(post.Process(res))
  })

loadResult loads one result from a earlier saved BatchJob session. So, the res object needs ~170MB (roughly all 100 objects are the same size +/-5MB). When executing the piece of code, the memory footprint is as expected: 170MB*20= ~3.5GB (I used 20 cores). When executing this piece of code for the second time, my machine inhales a vast amount of memory (more than available - so I stop execution). What is expected because, again, mclapplyforks the complete environment for each child and my environment now has the big opt.Models variable with ~10GB. Therefore 10*20=200GB would be required.

When I remove opt.Models, rm(opt.Models), I still run into the same problem. mclapply consumes more memory than available (btw: 90GB). So, which environment does mclapply fork, or isn't opt.Models totally gone? I can't see it using ls().

Maybe one of you has observed something similar.

Best regards,

Mario

Solution

You should call the gc function after removing the variable so that the memory associated with the object is freed by the garbage collector sooner rather than later. The rm function only removes the reference to the data, while the actual object may continue to exist until the garbage collector eventually runs.

You may also want to call gc before the first mclapply to make testing easier:

gc()
opt.Models = mclapply(1:100, mc.cores=20, function(i){
    res = loadResult(reg, id=i)    
    return(post.Process(res))
  })

# presumably do something with opt.Models...

rm(opt.Models)
gc()  # free up memory before forking

opt.Models = mclapply(1:100, mc.cores=20, function(i){
    res = loadResult(reg, id=i)    
    return(post.Process(res))
  })