Search code examples
pythonazure-machine-learning-service

AzureML: ResolvePackageNotFound azureml-dataprep


I've got a basic ScriptStep in my AML Pipeline and it's just trying to read an attached dataset. When i execute this simple example, the pipeline fails with the following in the driver log:

ImportError: azureml-dataprep is not installed. Dataset cannot be used without azureml-dataprep. Please make sure azureml-dataprep[fuse,pandas] is installed by specifying it in the conda dependencies. pandas is optional and should be only installed if you intend to create a pandas DataFrame from the dataset.

I then modified my step to include the conda package but then the driver fails with "ResolvePackageNotFound: azureml-dataprep". The entire log file can be accessed here.

# create a new runconfig object
run_config = RunConfiguration()
run_config.environment.docker.enabled = True
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE
run_config.environment.python.user_managed_dependencies = False
run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['azureml-dataprep[pandas,fuse]'])

source_directory = './read-step'
print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))
step2 = PythonScriptStep(name="read_step",
                         script_name="Read.py", 
                         arguments=["--dataFilePath", dataset.as_named_input('local_ds').as_mount() ],
                         compute_target=aml_compute, 
                         source_directory=source_directory,
                         runconfig=run_config,
                         allow_reuse=False)

I'm out of ideas, would deeply appreciate any help here!


Solution

  • The azureml-sdk isn't available on conda, you need to install it with pip.

    myenv = Environment(name="myenv")
    conda_dep = CondaDependencies().add_pip_package("azureml-dataprep[pandas,fuse]")
    myenv.python.conda_dependencies=conda_dep
    run_config.environment = myenv
    

    For more information, about this error, the logs tab has a log named 20_image_build_log.txt which Docker build logs. It contains the error where conda failed to failed to find azureml-dataprep

    EDIT:

    Soon, you won't have to specify this dependency anymore. the Azure Data4ML team says azureml-dataprep[pandas,fuse] is getting added as a dependency for azureml-defaults which is automatically installed on all images.