Search code examples
pythonazure-machine-learning-service

AzureDevOPS ML Error: We could not find config.json in: /home/vsts/work/1/s or in its parent directories


I am trying to create an Azure DEVOPS ML Pipeline. The following code works 100% fine on Jupyter Notebooks, but when I run it in Azure Devops I get this error:

Traceback (most recent call last):
  File "src/my_custom_package/data.py", line 26, in <module>
    ws = Workspace.from_config()
  File "/opt/hostedtoolcache/Python/3.8.7/x64/lib/python3.8/site-packages/azureml/core/workspace.py", line 258, in from_config
    raise UserErrorException('We could not find config.json in: {} or in its parent directories. '
azureml.exceptions._azureml_exception.UserErrorException: UserErrorException:
    Message: We could not find config.json in: /home/vsts/work/1/s or in its parent directories. Please provide the full path to the config file or ensure that config.json exists in the parent directories.
    InnerException None
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "We could not find config.json in: /home/vsts/work/1/s or in its parent directories. Please provide the full path to the config file or ensure that config.json exists in the parent directories."
    }
}

The code is:

#import
from sklearn.model_selection import train_test_split
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.experiment import Experiment
from datetime import date
from azureml.core import Workspace, Dataset



import pandas as pd
import numpy as np
import logging

#getdata
subscription_id = 'mysubid'
resource_group = 'myrg'
workspace_name = 'mlplayground'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='correctData')


#auto ml
ws = Workspace.from_config()


automl_settings = {
    "iteration_timeout_minutes": 2880,
    "experiment_timeout_hours": 48,
    "enable_early_stopping": True,
    "primary_metric": 'spearman_correlation',
    "featurization": 'auto',
    "verbosity": logging.INFO,
    "n_cross_validations": 5,
    "max_concurrent_iterations": 4,
    "max_cores_per_iteration": -1,
}



cpu_cluster_name = "computecluster"
compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print(compute_target)
automl_config = AutoMLConfig(task='regression',
                             compute_target = compute_target,
                             debug_log='automated_ml_errors.log',
                             training_data = dataset,
                             label_column_name="paidInDays",
                             **automl_settings)

today = date.today()
d4 = today.strftime("%b-%d-%Y")

experiment = Experiment(ws, "myexperiment"+d4)
remote_run = experiment.submit(automl_config, show_output = True)

from azureml.widgets import RunDetails
RunDetails(remote_run).show()

remote_run.wait_for_completion()

Solution

  • There is something weird happening on your code, you are getting the data from a first workspace (workspace = Workspace(subscription_id, resource_group, workspace_name)), then using the resources from a second one (ws = Workspace.from_config()). I would suggest avoiding having code relying on two different workspaces, especially when you know that an underlying datasource can be registered (linked) to multiple workspaces (documentation).

    In general using a config.json file when instantiating a Workspace object will result in an interactive authentication. When your code will be processed and you will have a log asking you to reach a specific URL and enter a code. This will use your Microsoft account to verify that you are authorized to access the Azure resource (in this case your Workspace('mysubid', 'myrg', 'mlplayground')). This has its limitations when you start deploying the code onto virtual machines or agents, you will not always manually check the logs, access the URL and authenticate yourself.

    For this matter it is strongly recommended setting up more advanced authentication methods and personally I would suggest using the service principal one since it is simple, convinient and secure if done properly. You can follow Azure's official documentation here.