Search code examples
trainsclearml

How should Trains be used with hyper-param optimization tools like RayTune?


What could be a reasonable setup for this? Can I call Task.init() multiple times in the same execution?


Solution

  • Disclaimer: I'm part of the allegro.ai Trains team

    One solution is to inherit from trains.automation.optimization.SearchStrategy and extend the functionality. This is similar to the Optuna integration, where Optuna is used for the Bayesian optimization and Trains does the hyper-parameter setting, launching experiments, and retrieving performance metrics.

    Another option (not scalable but probably easier to start with), is to use have the RayTuner run your code (obviously setting the environment / git repo / docker etc is on the user), and have your training code look something like:

    # create new experimnt
    task = Task.init('hp optimization', 'ray-tuner experiment', reuse_last_task_id=False)
    # store the hyperparams (assuming hparam is a dict) 
    task.connect(hparam) 
    # training loop here
    # ...
    # shutdown experimnt
    task.close()
    

    This means every time the RayTuner executes the script a new experiment will be created, with new set of hyper parameters (assuming haparm is a dictionary, it will be registered on the experiment as hyper-parameters)