Search code examples
pythonazure-machine-learning-serviceazureml-python-sdk

How to set environment variables in pipelines in Azure ML SDK v2 with jobs.create_or_update()?


I am changing some of our code from Azure ML's SDK v1 to v2. However, when I invoke pipelines with components via ml_client.jobs.create_or_update, I just can't get them to use my environment variables. Here is what I am doing:

preprocessing_component = load_component(
    source=Path(__file__).parent / "preprocessing_component.yaml"
)
@pipeline()
def example_train_pipeline(input_data_path):
    preprocess_step = preprocessing_component(
        input_data_path=input_data_path

pipeline_job = example_train_pipeline(
    input_data_path=Input(
        type=AssetTypes.URI_FILE,
        path="xxx",
    )
)

pipeline_job.settings.default_compute = e.cluster_name
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name=experiment_name
)

I tried to set .env_variables when creating my AZ ML environment (which is loaded for this pipeline's component in the yaml). This stated this parameters was deprecated and I should use RunConfig.environment_variables instead. Thing is, I can't find docs on how to use a RunConfig with ml_client.jobs.create_or_update. I tried just passing a RunConfig with variables set via run_config.environment_variables to create_or_update, but this had no apparrent effect.


Solution

  • With the introduction of Azure ML SDK v2, the concept of components has been emphasized. These components allow you to define specific environments individually. You can set environment variables for each component as following sample code.

    You can define environment variables using Python code:

    environment_variables = {"environ": "val"}
    command_function = command(
        display_name="command-function-job",
        environment=environment,
        command='echo "hello world"',
        distribution=distribution,
        resources=resources,
        environment_variables=environment_variables,
        inputs=inputs,
        outputs=outputs,
    )