I'm having a hard time to understand how User assigned identity
works on Compute Clusters on Compute Clusters.
Today, I have a Compute Instance with a User assigned identity that will connect to other Azure services likes CosmosDB, Databricks and much more. So the User Identity has RBAC roles to it and also a SP made inside Databricks since it can not be synced with ADD.
So this work correclty, but when I want to launch a compute cluster from my compute instance, I user Azure SDK v2 to launch it. I tried to add a UserIdentityConfiguration()
to the command
but when checking on the raw yml file configuration of the compute cluster, I see that the Identity is set to null
"runDefinition": {
"script": null,
"command": "python main.py",
"useAbsolutePath": false,
"arguments": [],
"sourceDirectoryDataStore": null,
"framework": "Python",
"communicator": "None",
"target": "cpu-2-16",
"dataReferences": {},
"data": {},
"outputData": {},
"datacaches": [],
"jobName": null,
"maxRunDurationSeconds": null,
"nodeCount": 1,
"instanceTypes": [],
"priority": null,
"credentialPassthrough": true,
"identity": null,
I based my code like this example repo: https://github.com/MicrosoftDocs/azure-docs/blob/211d3450211c26e95c6f40b4e01bd3adf5077774/articles/machine-learning/how-to-use-serverless-compute.md?plain=1#L93
credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com
ml_client = MLClient(
credential=credential,
subscription_id="<Azure subscription id>",
resource_group_name="<Azure resource group>",
workspace_name="<Azure Machine Learning Workspace>",
)
job = command(
command="echo 'hello world'",
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
identity=UserIdentityConfiguration(),
)
# submit the command job
ml_client.create_or_update(job)
In the definition of the compute cluster, I do have the User Assigned Identity:
Also in my compute instance, I have the same User Assigned Identity (AZGRE........)
So how do I make my compute cluster take the identity of my compute instance? Or even take the Identity of the user running the code if he did an az login
?
According to this documentation UserIdentityConfiguration
Passthrough your Microsoft Entra identity, that is the reason you get null
in raw json defination but if you check in yaml defination you can see the UserIdentity
.
So, you need to use ManagedIdentityConfiguration
which is assigned while creating ml workspace itself.
Try below.
job = command(
command="echo 'hello world'",
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
identity=ManagedIdentityConfiguration(),
)
Output:
In raw json
In yaml
Make sure you required roles and permission.
EDIT
command_job = command(
code="./src",
command="python main.py --iris-csv ${{inputs.iris_csv}} --learning-rate ${{inputs.learning_rate}} --boosting ${{inputs.boosting}}",
environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
inputs={
"iris_csv": Input(
type="uri_file",
path="https://azuremlexamples.blob.core.windows.net/datasets/iris.csv",
),
"learning_rate": 0.9,
"boosting": "gbdt",
},
compute="cpu-cluster",
identity=ManagedIdentityConfiguration(<client_id>)
)
As given in the above code you give compute
option mentioning your new cluster created with user identity.
Learn more about command job here