Search code examples
rcronh2oautoml

h2o script fails to run via cron


I am having difficulties when running R script with h2o library via cron in linux.

The script runs perfectly fine in interactive mode, but when scheduled in cron the script fails.

Part of the code causing the error:

automl_h2o_models <- h2o.automl(
    x = predictors, 
    y = target,
    training_frame = train_conv_h2o,
    leaderboard_frame = valid_conv_h2o,
    max_runtime_secs = 3600,
    seed = 1234
)

When max_runtime_secs is set to 1800 there is no issue, but anything beyond this value will result in error below.

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 
  Unexpected CURL error: getaddrinfo() thread failed to start

I am on Ubuntu 20.04, R Version 3.6.3, h2o version 3.32.1.3


Solution

  • The issue is related to number of descriptors setting in linux. The cron environment is different than the system environment when running the script in interactive mode.

    As a solution I have used extra parameter in my cron :

    0 18 21 6 * ulimit -nS 1048576 && Rscript <script_name>
    

    Then the error disappeared and the script ran correctly.