I have written a pipeline that I want to run on a remote compute cluster within Azure Machine Learning. My aim is to process a large amount of historical data, and to do this I will need to run the pipeline on a large number of input parameter combinations.
Is there a way to restrict the number of nodes that the pipeline uses on the cluster? By default it will use all the nodes available to the cluster, and I would like to restrict it so that it only uses a pre-defined maximum. This allows me to leave the rest of the cluster free for other users.
My current code to start the pipeline looks like this:
# Setup the pipeline
steps = [data_import_step] # Contains PythonScriptStep
pipeline = Pipeline(workspace=ws, steps=steps)
pipeline.validate()
# Big long list of historical dates that I want to process data for
dts = pd.date_range('2019-01-01', '2020-01-01', freq='6H', closed='left')
# Submit the pipeline job
for dt in dts:
pipeline_run = Experiment(ws, 'my-pipeline-run').submit(
pipeline,
pipeline_parameters={
'import_datetime': dt.strftime('%Y-%m-%dT%H:00'),
}
)
For me, the killer feature of Azure ML is not having to worry about load balancing like this. Our team has a compute target with max_nodes=100
for every feature branch and we have Hyperdrive
pipelines that result in 130 runs for each pipeline.
We can submit multiple PipelineRun
s back-to-back and the orchestrator does the heavy lifting of queuing, submitting, all the runs so that the PipelineRun
s execute in the serial order I submitted them, and that the cluster is never overloaded. This works without issue for us 99% of the time.
If what you're looking for is that you'd like the PipelineRun
s to be executed in parallel, then you should check out ParallelRunStep
.
Another option is to isolate your computes. You can have up to 200 ComputeTarget
s per workspace. Two 50-node ComputeTarget
s cost the same as one 100-node ComputeTarget
.
On our team, we use pygit2
to have a ComputeTarget
created for each feature branch, so that, as data scientists, we can be confident that we're not stepping on our coworkers' toes.