The short story is, when I try to submit an azure ML pipeline run (an azure ML pipeline, not an Azure pipeline) from a jupyter notebook, I get PermissionError: [Errno 13] Permission denied: '.\NTUSER.DAT'. More details:
Relevant code:
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.runtime import AutoMLStep
automl_settings = {
"iteration_timeout_minutes": 20,
"experiment_timeout_minutes": 30,
"n_cross_validations": 3,
"primary_metric": 'r2_score',
"preprocess": True,
"max_concurrent_iterations": 3,
"max_cores_per_iteration": -1,
"verbosity": logging.INFO,
"enable_early_stopping": True,
'time_column_name': "DateTime"
}
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
path = ".",
compute_target=compute_target,
run_configuration=conda_run_config,
training_data = financeforecast_dataset,
label_column_name = 'TotalUSD',
**automl_settings
)
automl_step = AutoMLStep(
name='automl_module',
automl_config=automl_config,
allow_reuse=False)
training_pipeline = Pipeline(
description="training_pipeline",
workspace=ws,
steps=[automl_step])
training_pipeline_run = Experiment(ws, 'test').submit(training_pipeline)
The training_pipeline step runs for apx 20 seconds, and then I get a long trace, ending in:
~\AppData\Local\Continuum\anaconda2\envs\forecasting\lib\site-
packages\azureml\pipeline\core\_module_builder.py in _hash_from_file_paths(hash_src)
100 hasher = hashlib.md5()
101 for f in hash_src:
--> 102 with open(str(f), 'rb') as afile:
103 buf = afile.read()
104 hasher.update(buf)
PermissionError: [Errno 13] Permission denied: '.\\NTUSER.DAT'
According to Azure's docs on this topic, submitting a pipeline uploads a "snapshot" of the "source directory" you specified. Initially, I hadn't specified a source directory, so, to test that out, I added:
default_source_directory="testing",
as a parameter for the training_pipeline object, but saw the same behavior when I then tried to run it. Not sure if that is the same source directory the documentation is referring to. The docs also say that if no source directory is specified, the "current local directory" is uploaded. I used print (os.getcwd()) to get the working directory and gave "Everyone" full control permissions on the directory (working in a windows env).
All the preceding code works fine, and I can submit an experiment if I use a ScriptRunConfig and run it on attached compute rather than using a pipeline/training cluster.
Any ideas? Thanks in advance to anyone who tries to help. P.S. There is no "azure-machine-learning-pipelines" tag, and I can't add one because I don't have enough reputation points. Someone else could though! General info on what they are.
I resolved this answer by setting the path and the data_script variables in the AutoMLConfig task object, like this (relevant code indicated by -->):
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
compute_target=compute_target,
run_configuration=conda_run_config,
-->path = "c:\\users\\me",
data_script ="script.py",<--
**automl_settings
)
Setting the data_script variable to include the full path, as shown below, did not work.
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
path = ".",
-->data_script = "c:\\users\\me\\script.py"<--
compute_target=compute_target,
run_configuration=conda_run_config,
**automl_settings
)