I've got a basic ScriptStep in my AML Pipeline and it's just trying to read an attached dataset. When i execute this simple example, the pipeline fails with the following in the driver log:
ImportError: azureml-dataprep is not installed. Dataset cannot be used without azureml-dataprep. Please make sure azureml-dataprep[fuse,pandas] is installed by specifying it in the conda dependencies. pandas is optional and should be only installed if you intend to create a pandas DataFrame from the dataset.
I then modified my step to include the conda package but then the driver fails with "ResolvePackageNotFound: azureml-dataprep". The entire log file can be accessed here.
# create a new runconfig object
run_config = RunConfiguration()
run_config.environment.docker.enabled = True
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE
run_config.environment.python.user_managed_dependencies = False
run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['azureml-dataprep[pandas,fuse]'])
source_directory = './read-step'
print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))
step2 = PythonScriptStep(name="read_step",
script_name="Read.py",
arguments=["--dataFilePath", dataset.as_named_input('local_ds').as_mount() ],
compute_target=aml_compute,
source_directory=source_directory,
runconfig=run_config,
allow_reuse=False)
I'm out of ideas, would deeply appreciate any help here!
The azureml-sdk
isn't available on conda, you need to install it with pip
.
myenv = Environment(name="myenv")
conda_dep = CondaDependencies().add_pip_package("azureml-dataprep[pandas,fuse]")
myenv.python.conda_dependencies=conda_dep
run_config.environment = myenv
For more information, about this error, the logs tab has a log named 20_image_build_log.txt
which Docker build logs. It contains the error where conda
failed to failed to find azureml-dataprep
EDIT:
Soon, you won't have to specify this dependency anymore. the Azure Data4ML team says azureml-dataprep[pandas,fuse]
is getting added as a dependency for azureml-defaults
which is automatically installed on all images.