Search code examples
azureazure-machine-learning-serviceazureml-python-sdk

How to pass the datastore URI to azureml.fsspec.AzureMachineLearningFileSystem Python SDK?


I have registered a datastore which is an ADLS.

datastore = mlclient.datastores.get(ds_name)
from azureml.fsspec import AzureMachineLearningFileSystem

#azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastore/datastorename
ds_url = f"azureml://subscriptions/{subscriptionID}/resourcegroups/{RG}/workspaces/{ws_name}/datastore/adls/paths/iris-processed/*"
fs = AzureMachineLearningFileSystem(ds_url)
fs.ls()

I am getting the following error even if I use datastore.id:

ValueError: azureml://subscriptions/xx/resourcegroups/xx/workspaces/xx/datastore/adls/paths/iris-processed/* is not a valid datastore uri: azureml://subscriptions/([^\/]+)/resourcegroups/([^\/]+)/(?:Microsoft.MachineLearningServices/)?workspaces/([^\/]+)/datastores/([^\/]+)/paths/(.*)

Solution

  • ValueError:azureml://subscriptions/xx/resourcegroups/xx/workspaces/xx/datastore/adls/paths/iris-processed/* is not a valid datastoreuri:azureml://subscriptions/([^/]+)/resourcegroups/([^/]+)/(?:Microsoft.MachineLearningServices/)workspaces/([^/]+)/datastores/([^/]+)/paths/(.*)

    The above error occurs when you pass the wrong parameters in the URI like (Susbcriptionid, Resource group, Workspace name, Datastore name, and path).

    I tried with proper parameters in the Uri with the same code and got the expected results.

    Code:

    from azureml.fsspec import AzureMachineLearningFileSystem
    
    subscription_id = 'Subscription-id'
    resource_group = 'Your-resource-group'
    workspace_name = 'Workspacename'
    input_datastore_name = 'datastore1'
    path_on_datastore = 'folder1/'
    
    #azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastore/datastorename
    ds_url =  f'azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace_name}/datastores/{input_datastore_name}/paths/{path_on_datastore}'
    fs = AzureMachineLearningFileSystem(ds_url)
    f_list = fs.ls()
    print(f_list)
    

    Output:

    ['datastore1/folder1/09-05-2023 (1).html', 'datastore1/folder1/09-05-2023.html', 'datastore1/folder1/10-05-2023.html', 'datastore1/folder1/10-05=2023.html', 'datastore1/folder1/11-05-2023.html', 'datastore1/folder1/12-05-2023 (1).html', 'datastore1/folder1/12-05-2023.html', 'datastore1/folder1/timezone.csv']
    

    enter image description here

    Reference: Is there a way to get list of folders from a datastore in Azure ML studio with Python SDK v2 - Stack Overflow by khemanth958.