I have a requirement to use azure machine learning to develop a pipeline. In this pipeline we don't pass data as inputs/outputs but variables (for example a list or an int). I have looked on the Microsoft documentation but could not seem to find something fitting my case. Also tried to use the PipelineData class but could not retrieve my variables.
Thanks for your help.
I know I'm a bit late to the party but here we go:
Passing variables between AzureML Pipeline Steps
To directly answer your question, to my knowledge it is not possible to pass variables directly between PythonScriptSteps in an AzureML Pipeline.
The reason for that is that the steps are executed in isolation, i.e. the code is run in different processes or even computes. The only interface a PythonScriptStep has is (a) command line arguments that need to be set prior to submission of the pipeline and (b) data.
Using datasets to pass information between PythonScriptSteps
As a workaround you can use PipelineData to pass data between steps. The previously posted blog post may help: https://vladiliescu.net/3-ways-to-pass-data-between-azure-ml-pipeline-steps/
As for your concrete problem:
# pipeline.py
# This will make Azure create a unique directory on the datastore everytime the pipeline is run.
variables_data = PipelineData("variables_data", datastore=datastore)
# `variables_data` will be mounted on the target compute and a path is given as a command line argument
write_variable = PythonScriptStep(
script_name="write_variable.py",
arguments=[
"--data_path",
variables_data
],
outputs=[variables_data],
)
read_variable = PythonScriptStep(
script_name="read_variable.py",
arguments=[
"--data_path",
variables_data
],
inputs=[variables_data],
)
In your script you'll want to serialize the variable / object that you're trying to pass between steps:
(You could of course use JSON or any other serialization method)
# write_variable.py
import argparse
import pickle
from pathlib import Path
parser = argparse.ArgumentParser()
parser.add_argument("--data_path")
args = parser.parse_args()
obj = [1, 2, 3, 4]
Path(args.data_path).mkdir(parents=True, exist_ok=True)
with open(args.data_path + "/obj.pkl", "wb") as f:
pickle.dump(obj, f)
Finally, you can read the variable in the next step:
# read_variable.py
import argparse
import pickle
parser = argparse.ArgumentParser()
parser.add_argument("--data_path")
args = parser.parse_args()
with open(args.data_path + "/obj.pkl", "rb") as f:
obj = pickle.load(f)
print(obj)