Search code examples
azuremodelpredictazure-synapseautoml

Azure Synapste Predict Model with Synapse ML predict


I follow the official tutotial from microsoft: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-score-model-predict-spark-pool

But when I execute:

#Bind model within Spark session
model = pcontext.bind_model(
    return_types=RETURN_TYPES, 
    runtime=RUNTIME, 
    model_alias="Sales", #This alias will be used in PREDICT call to refer  this   model
    model_uri=AML_MODEL_URI, #In case of AML, it will be AML_MODEL_URI
    aml_workspace=ws #This is only for AML. In case of ADLS, this parameter can be removed
).register()

I´ve got:

NotADirectoryError: [Errno 20] Not a directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1648328086462_0002/spark-3d802a7e-15b7-4eb6-88c5-f0e01f8cdb35/userFiles-fbe23a43-67d3-4e65-a879-4a497e804b40/68603955220f5f8646700d809b71be9949011a2476a34965a3d5c0f3d14de79b.pkl/MLmodel' Traceback (most recent call last):

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_context.py", line 47, in bind_model udf = _create_udf(

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_udf.py", line 104, in _create_udf model_runtime = runtime_gen._create_runtime()

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 103, in _create_runtime if self._check_model_runtime_compatibility(model_runtime):

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 166, in _check_model_runtime_compatibility model_wrapper = self._load()

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 78, in _load return SynapsePredictModelCache._get_or_load(

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_cache.py", line 172, in _get_or_load model = load_model(runtime, model_uri, functions)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 257, in load_model model = loader.load(model_uri, functions)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 122, in load model = self._load(model_uri)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 215, in _load return self._load_mlflow(model_uri)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 59, in _load_mlflow model = mlflow.pyfunc.load_model(model_uri)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/mlflow/pyfunc/init.py", line 640, in load_model model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/mlflow/models/model.py", line 124, in load with open(path) as f:

NotADirectoryError: [Errno 20] Not a directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1648328086462_0002/spark-3d802a7e-15b7-4eb6-88c5-f0e01f8cdb35/userFiles-fbe23a43-67d3-4e65-a879-4a497e804b40/68603955220f5f8646700d809b71be9949011a2476a34965a3d5c0f3d14de79b.pkl/MLmodel'

How can I fix that error ?


Solution

  • (UPDATE:29/3/2022): You will experiencing this error message if you model does not contains all the required files in the ML model.

    As per the repro, I had created two ML models named:

    sklearn_regression_model: Which contains only sklearn_regression_model.pkl file.

    enter image description here

    When I predict for MLFLOW packaged model named sklearn_regression_model, getting same error as shown above:

    enter image description here

    linear_regression: Which contains the below files:

    enter image description here

    When I predict for MLFLOW packaged model named linear_regression, it works as excepted.

    enter image description here


    It should be AML_MODEL_URI = "" #In URI ":x" => Rossman_Sales:2

    enter image description here

    Before running this script, update it with the URI for ADLS Gen2 data file along with model output return data type and ADLS/AML URI for the model file.

    #Set model URI
           #Set AML URI, if trained model is registered in AML
              AML_MODEL_URI = "<aml model uri>" #In URI ":x" signifies model version in AML. You can   choose which model version you want to run. If ":x" is not provided then by default   latest version will be picked.
    
           #Set ADLS URI, if trained model is uploaded in ADLS
              ADLS_MODEL_URI = "abfss://<filesystemname>@<account name>.dfs.core.windows.net/<model   mlflow folder path>"
    

    Model URI from AML Workspace:

    DATA_FILE = "abfss://[email protected]/AML/LengthOfStay_cooked_small.csv"
    AML_MODEL_URI_SKLEARN = "aml://mlflow_sklearn:1" #Here ":1" signifies model version in AML. We can choose which version we want to run. If ":1" is not provided then by default latest version will be picked
    RETURN_TYPES = "INT"
    RUNTIME = "mlflow"
    

    Model URI uploaded to ADLS Gen2:

    DATA_FILE = "abfss://[email protected]/AML/LengthOfStay_cooked_small.csv"
    AML_MODEL_URI_SKLEARN = "abfss://[email protected]/linear_regression/linear_regression" #Here ":1" signifies model version in AML. We can choose which version we want to run. If ":1" is not provided then by default latest version will be picked
    RETURN_TYPES = "INT"
    RUNTIME = "mlflow"