Search code examples
pythonazuremachine-learningcommandazure-machine-learning-service

Control where Source Code for Azure ML Command gets Uploaded


I'm working in a notebook in Azure Machine Learning Studio and I'm using the following code block to instantiate a job using the command function.

from azure.ai.ml import command, Input, Output
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

subscription_id = "<subscription_id>"
resource_group = "<resource_group>"
workspace = "<workspace>"
storage_account = "<storage_account>"
input_path = "<input_path>"
output_path = "<output_path>"

input_dict = {
    "input_data_object": Input(
        type=AssetTypes.URI_FILE, 
        path=f"azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace}/datastores/{storage_account}/paths/{input_path}"
    )
}

output_dict = {
    "output_folder_object": Output(
        type=AssetTypes.URI_FOLDER,
        path=f"azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace}/datastores/{storage_account}/paths/{output_path}",
    )
}

job = command(
    code="./src", 
    command="python 01_read_write_data.py -v --input_data=${{inputs.input_data_object}} --output_folder=${{outputs.output_folder_object}}",
    inputs=input_dict,
    outputs=output_dict,
    environment="<asset_env>",
    compute="<compute_cluster>",
)

returned_job = ml_client.create_or_update(job)

This runs successfully but with each run, if the code stored within the ./src directory changes then a new copy is uploaded to the default blob storage account. I don't mind this, but with each run, the code is uploaded to a new container at the root of my blob storage account. Therefore my default storage account is getting cluttered with containers. I've read the docs for instantiating a command object using the command() function, but I see no parameter available to control where my ./src code gets uploaded. Is there any way to control this?


Solution

  • You can not control where the code should upload but what you can do is directly giving the path of your storage account where you code is saved.

    For the parameter code in command job function you can give

    local path or "http:", "https:", or "azureml:" url pointing to a remote location.

    When you give local path it tries to create container and upload the code, so, i would recommend using https.

    If the you python code in src folder than you upload it to any folder in your storage account and copy https path.

    enter image description here

    Sample code

    command_job = command(
        code="https://jgsml28890137600.blob.core.windows.net/data/src/",
        command="echo hi from command",
        environment="azureml:docker-context-example-10:1",
        timeout=10800
    )
    

    Note: Storage account should be associated with ml workspace, if not create a datastore linking you storage account and the http path.