Search code examples
azuremachine-learningazure-machine-learning-servicemlopsazureml-python-sdk

How list all available Dataset Versions in an Azure ML Dataset and also get the One before the Latest Version of the Dataset


Is there a way to list all the available versions of an Azure ML Dataset? Not via the UI, but by using the SDK. Also, How can we get the one before the latest version of that Azure ML Dataset?

The main goal here is to do identify the changes in the Data trends.


Solution

  • Create a Machine learning studio resource group and workspace. Upload the dataset for several times and it will be updated with versions with the same name.

    enter image description here

    enter image description here

    Use the below code block to get the versions of the dataset uploaded and information about those versions.

    Code block 1

    from azureml.core import Dataset
    Diabetes1234 = Dataset.get_all(workspace = ws)
    counts = Diabetes1234['Diabetes123'].version
    versions = [Dataset.get_by_name(workspace = ws, name = 'Diabetes123', version = v) for v in range(1,counts+1)]
    

    Code block 2

    versions
    

    Output

    [{
       "source": [
         "('workspaceblobstore', 'UI/2022-10-14_055538_UTC/')"
       ],
       "definition": [
         "GetDatastoreFiles",
         "ParseDelimited",
         "DropColumns",
         "SetColumnTypes"
       ],
       "registration": {
         "id": "Your ID",
         "name": "Diabetes123",
         "version": 1,
         "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
       }
     },
     {
       "source": [
         "('workspaceblobstore', 'UI/2022-10-14_055914_UTC/')"
       ],
       "definition": [
         "GetDatastoreFiles",
         "ParseDelimited",
         "DropColumns",
         "SetColumnTypes"
       ],
       "registration": {
         "id": " Your ID ",
         "name": "Diabetes123",
         "version": 2,
         "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
       }
     },
     {
       "source": [
         "('workspaceblobstore', 'UI/2022-10-14_060011_UTC/')"
       ],
       "definition": [
         "GetDatastoreFiles",
         "ParseDelimited",
         "DropColumns",
         "SetColumnTypes"
       ],
       "registration": {
         "id": " Your ID ",
         "name": "Diabetes123",
         "version": 3,
         "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
       }
     },
     {
       "source": [
         "('workspaceblobstore', 'UI/2022-10-14_070300_UTC/')"
       ],
       "definition": [
         "GetDatastoreFiles",
         "ParseDelimited",
         "DropColumns",
         "SetColumnTypes"
       ],
       "registration": {
         "id": " Your ID ",
         "name": "Diabetes123",
         "version": 4,
         "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
       }
     },
     {
       "source": [
         "('workspaceblobstore', 'UI/2022-10-14_093655_UTC/')"
       ],
       "definition": [
         "GetDatastoreFiles",
         "ParseDelimited",
         "DropColumns",
         "SetColumnTypes"
       ],
       "registration": {
         "id": " Your ID ",
         "name": "Diabetes123",
         "version": 5,
         "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
       }
     }]
    

    To get the last before the latest version. Use the below code block.

    Code Block:

    versions[-2]
    

    Output

    {
      "source": [
        "('workspaceblobstore', 'UI/2022-10-14_070300_UTC/')"
      ],
      "definition": [
        "GetDatastoreFiles",
        "ParseDelimited",
        "DropColumns",
        "SetColumnTypes"
      ],
      "registration": {
        "id": "your ID",
        "name": "Diabetes123",
        "version": 4,
        "workspace": "Workspace.create(name='cancerset', subscription_id=your subscription ID', resource_group='your resource group')"
      }
    }