Search code examples
rloggingh2o

How to stop h2o from saving massive .ERR, .OUT and other log files to the local drive


I am currently running a script in which several h2o glm and deeplearning models are being generated for several iterations of a Monte-Carlo Cross-Validation. When finished running (which takes about half a day), h2o is saving immense files to the local drive (with sizes up to 8.5 GB). These files are not erased when RStudio or my computer is restarted (as I originally thought). Is there a way to stop h2o from saving these files?


Solution

  • When you start H2O with h2o.init() from R, the stdout and stderr files should be saved to a temporary directory (see R's tempdir() to see the path). This temporary directory should be removed when the R session exits. It seems as though this is not working with RStudio, however it works if you are using R from the command line. I'm not sure if this is a setting that can be changed in RStudio or if this is an RStudio bug.

    But you can take more control yourself. You can start H2O by hand using java on the command line and then connect from R using h2o.init().

    java -Xmx5g -jar h2o.jar
    

    In this example, I started H2O with 5 GB of Java heap memory, but you should increase that if your data is larger. Then connecting in R will look like this:

    > h2o.init()
     Connection successful!
    
    R is connected to the H2O cluster: 
        H2O cluster uptime:         16 hours 34 minutes 
        H2O cluster version:        3.15.0.99999 
        H2O cluster version age:    17 hours and 25 minutes  
        H2O cluster name:           H2O_started_from_R_me_exn817 
        H2O cluster total nodes:    1 
        H2O cluster total memory:   4.43 GB 
        H2O cluster total cores:    8 
        H2O cluster allowed cores:  8 
        H2O cluster healthy:        TRUE 
        H2O Connection ip:          localhost 
        H2O Connection port:        54321 
        H2O Connection proxy:       NA 
        H2O Internal Security:      FALSE 
        H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
        R Version:                  R version 3.3.2 (2016-10-31) 
    

    So if you want to redirect both stdout and stderr to devnull you simply add the redirect command to the end of the java command to start the H2O cluster and connect to H2O from R again. To redirect both stderr and stdout, you append > /dev/null 2>&1 like this:

    java -Xmx5g -jar h2o.jar > /dev/null 2>&1 &