I recently started using Metaflow for my hyperparameter searches. I'm using a foreach
for all my parameters as follows:
from metaflow import FlowSpec, step
@step
def start_hpo(self):
self.next(self.train_model, foreach='hpo_parameters')
@step
def train_model(self):
# Trains model...
This works as it starts the step train_model
as intended but unfortunately it wants to parallelise all steps at once. This causes my gpu / cpu to run out of memory instantly failing the step.
Is there a way to tell metaflow to do these steps linearly / one at a time instead or another workaround?
Thanks
@BBQuercus You can limit parallelization by using the --max-workers
flag.
Currently, we run no more than 16 tasks in parallel and you can override it as python myflow.py run --max-workers 32
for example.