Search code examples
rmemory-leaksmultinomialgbm

R gbm() function - RAM not released? memory leak?


i am running the gbm() function for multiple additive multinomial models with 6 response categories each on a large dataset (~ 0.5-1 mio. lines per model). The model is like this (pretty much the defaults).

gbm <-
gbm(Y ~ A + B + C + D + E + F, 
  data=data,                   
  var.monotone=c(0,0,0,0,0,0), 
  distribution="multinomial", 
  n.trees=500,                
  shrinkage=0.1,               
  interaction.depth=1,        
  bag.fraction = 0.5,          
  train.fraction = 0.5,        
  n.minobsinnode = 5,         
  cv.folds = 0,               
  keep.data=TRUE,              
  verbose=FALSE,                
  weights=sampleWeight)     

Y is a factor with 6 categories, the explaining variables are metric and factors. data is a data.table. This code runs fine. The prediction is good. When this is done i save the predictions and clean the workspace with: rm(list=ls(all=TRUE)) and additionally run gc() but it will not release the memory. I expect that when cleaning all the workspace i should have about the same memory usage as at the start of the R session.

In my specific case the RAM usage is about 1.5GB after loading the data. After fitting the model its at the limit of my pc at about 14GB. After cleaning the Workspace its at about 12GB. The only solution at the moment for me is to restart the whole R session, reload the data and run the next model.

Is there a solution to this, so that i dont have to restart the session all the time?

Thanks a lot!


Solution

  • Yes, there is a memory leak with gbm. Ironically the fix is on the gbm website, but the maintainers have failed to incorporate it into the CRAN release.

    http://r-forge.r-project.org/tracker/?atid=1813&group_id=443&func=browse