Search code examples
azureazure-machine-learning-serviceazureml-python-sdk

Retrieving current job for Azure ML v2


Using the v2 Azure ML Python SDK (azure-ai-ml) how do I get an instance of the currently running job?

In v1 (azureml-core) I would do:

from azureml.core import Run

run = Run.get_context()
if isinstance(run, Run):
    print("Running on compute...")

What is the equivalent on the v2 SDK?


Solution

  • This is a little more involved in v2 than in was in v1. The reason is that v2 makes a clear distinction between the control plane (where you start/stop your job, deploy compute, etc.) and the data plane (where you run your data science code, load data from storage, etc.).

    Jobs can do control plane operations, but they need to do that with a proper identity that was explicitly assigned to the job by the user.

    Let me show you the code how to do this first. This script creates an MLClient and then connects to the service using that client in order to retrieve the job's metadata from which it extracts the name of the user that submitted the job:

    # control_plane.py
    from azure.ai.ml import MLClient
    from azure.ai.ml.identity import AzureMLOnBehalfOfCredential
    import os
    
    def get_ml_client():
        uri = os.environ["MLFLOW_TRACKING_URI"]
        uri_segments = uri.split("/")
        subscription_id = uri_segments[uri_segments.index("subscriptions") + 1]
        resource_group_name = uri_segments[uri_segments.index("resourceGroups") + 1]
        workspace_name = uri_segments[uri_segments.index("workspaces") + 1]
        credential = AzureMLOnBehalfOfCredential()
        client = MLClient(
            credential=credential,
            subscription_id=subscription_id,
            resource_group_name=resource_group_name,
            workspace_name=workspace_name,
        )
        return client
    
    ml_client = get_ml_client()
    this_job = ml_client.jobs.get(os.environ["MLFLOW_RUN_ID"])
    print("This job was created by:", this_job.creation_context.created_by)
    

    As you can see, the code uses a special AzureMLOnBehalfOfCredential to create the MLClient. Options that you would use locally (AzureCliCredential or InteractiveBrowserCredential) won't work for a remote job since you are not authenticated through az login or through the browser prompt on that remote run. For your credentials to be available on the remote job, you need to run the job with user_identity. And you need to retrieve the corresponding credential from the environment by using the AzureMLOnBehalfOfCredential class.

    So, how do you run a job with user_identity? Below is the yaml that will achieve it:

    $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
    type: command
    command: |
      pip install azure-ai-ml 
      python control_plane.py
    code: code
    environment: 
      image: library/python:latest
    compute: azureml:cpu-cluster
    identity:
      type: user_identity
    

    Note the identity section at the bottom. Also note that I am lazy and install the azureml-ai-ml sdk as part of the job. In a real setting, I would of course create an environment with the package installed.

    These are the valid settings for the identity type:

    • aml_token: this is the default which will not allow you to access the control plane
    • managed or managed_identity: this means the job will be run under the given managed identity (aka compute identity). This would be accessed in your job via azure.identity.ManagedIdentityCredential. Of course, you need to provide the chosen compute identity with access to the workspace to be able to read job information.
    • user_identity: this will run the job under the submitting user's identity. It is to be used with the azure.ai.ml.identity.AzureMLOnBehalfOfCredential credentials as shown above.

    So, for your use case, you have 2 options:

    1. You could run the job with user_identity and use the AzureMLOnBehalfOfCredential class to create the MLClient
    2. You could create the compute with a managed identity which you give access to the workspace and then run the job with managed_identity and use the ManagedIdentityCredential class to create the MLClient