Search code examples
pythonazureazure-machine-learning-service

Can azureml pass variables from one step to another?


I have a requirement to use azure machine learning to develop a pipeline. In this pipeline we don't pass data as inputs/outputs but variables (for example a list or an int). I have looked on the Microsoft documentation but could not seem to find something fitting my case. Also tried to use the PipelineData class but could not retrieve my variables.

  1. Is this possible?
  2. Is this a good approach?

Thanks for your help.


Solution

  • I know I'm a bit late to the party but here we go:

    Passing variables between AzureML Pipeline Steps

    To directly answer your question, to my knowledge it is not possible to pass variables directly between PythonScriptSteps in an AzureML Pipeline.

    The reason for that is that the steps are executed in isolation, i.e. the code is run in different processes or even computes. The only interface a PythonScriptStep has is (a) command line arguments that need to be set prior to submission of the pipeline and (b) data.

    Using datasets to pass information between PythonScriptSteps

    As a workaround you can use PipelineData to pass data between steps. The previously posted blog post may help: https://vladiliescu.net/3-ways-to-pass-data-between-azure-ml-pipeline-steps/

    As for your concrete problem:

    # pipeline.py
    
    # This will make Azure create a unique directory on the datastore everytime the pipeline is run.
    variables_data = PipelineData("variables_data", datastore=datastore)
    
    # `variables_data` will be mounted on the target compute and a path is given as a command line argument
    write_variable = PythonScriptStep(
        script_name="write_variable.py",
        arguments=[
            "--data_path",
            variables_data
        ],
        outputs=[variables_data],
    )
    
    read_variable = PythonScriptStep(
        script_name="read_variable.py",
        arguments=[
            "--data_path",
            variables_data
        ],
        inputs=[variables_data],
    )
    
    

    In your script you'll want to serialize the variable / object that you're trying to pass between steps:

    (You could of course use JSON or any other serialization method)

    # write_variable.py
    
    import argparse
    import pickle
    from pathlib import Path
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--data_path")
    args = parser.parse_args()
    
    obj = [1, 2, 3, 4]
    
    Path(args.data_path).mkdir(parents=True, exist_ok=True)
    with open(args.data_path + "/obj.pkl", "wb") as f:
        pickle.dump(obj, f)
    

    Finally, you can read the variable in the next step:

    # read_variable.py
    
    import argparse
    import pickle
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--data_path")
    args = parser.parse_args()
    
    
    with open(args.data_path + "/obj.pkl", "rb") as f:
        obj = pickle.load(f)
    
    print(obj)