Search code examples
pythonapache-sparkmachine-learningpysparkazure-machine-learning-service

PySpark ALSModel load fails in deployment over Azure ML service with error java.util.NoSuchElementException: Param blockSize does not exist


I am trying to deploy an ALS model trained using PySpark on Azure ML service. I am providing a score.py file that loads the trained model using ALSModel.load() function. Following is the code of my score.py file.

import os
from azureml.core.model import Model
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import DoubleType, StringType
from pyspark.sql import SQLContext
from pyspark import SparkContext

sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
spark = sqlContext.sparkSession

input_schema = StructType([StructField("UserId", StringType())])
reader = spark.read
reader.schema(input_schema)


def init():
    global model
    # note here "iris.model" is the name of the model registered under the workspace
    # this call should return the path to the model.pkl file on the local disk.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), "recommendation-model")
    # Load the model file back into a LogisticRegression model
    model = ALSModel.load(model_path)
    

def run(data):
    try:
        input_df = reader.json(sc.parallelize([data]))
        input_df = indexer.transform(input_df)
        
        res = model.recommendForUserSubset(input_df[['UserId_index']], 10)

        # you can return any datatype as long as it is JSON-serializable
        return result.collect()[0]['recommendations']
    except Exception as e:
        traceback.print_exc()
        error = str(e)
        return error

Following is the error I get when I deploy it as LocalWebService using Model.deploy function in Azure ML service

Generating Docker build context.
Package creation Succeeded
Logging into Docker registry viennaglobal.azurecr.io
Logging into Docker registry viennaglobal.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM viennaglobal.azurecr.io/azureml/azureml_43542b56c5ec3e8d0f68e1556558411f
 ---> 5b3bb174ca5f
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 8e540c0746f7
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjNkN2M1ZjM4LTI1ODEtNGUxNi05NTdhLWEzOTU1OGI1ZjBiMyIsInJlc291cmNlR3JvdXBOYW1lIjoiZGV2LW9tbmljeC10ZnMtYWkiLCJhY2NvdW50TmFtZSI6ImRldi10ZnMtYWktd29ya3NwYWNlIiwid29ya3NwYWNlSWQiOiI1NjkzNGMzNC1iZmYzLTQ3OWUtODRkMy01OGI4YTc3ZTI4ZjEifSwibW9kZWxzIjp7fSwibW9kZWxzSW5mbyI6e319 | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in 502ad8edf91e
 ---> a1bc5e0283d0
Step 4/5 : RUN mv '/var/azureml-app/tmpvxhomyin.py' /var/azureml-app/main.py
 ---> Running in eb4ec1a0b702
 ---> 6a3296fe6420
Step 5/5 : CMD ["runsvdir","/var/runit"]
 ---> Running in 834fd746afef
 ---> 5b9f8be538c0
Successfully built 5b9f8be538c0
Successfully tagged recommend-service:latest
Container (name:musing_borg, id:0f3163692f5119685eee5ed59c8e00aa96cd472f765e7db67653f1a6ce852e83) cannot be killed.
Container has been successfully cleaned up.
Image sha256:0f146f4752878bbbc0e876f4477cc2877ff12a366fca18c986f9a9c2949d028b successfully removed.
Starting Docker container...
Docker container running.
Checking container health...
ERROR - Error: Container has crashed. Did your init method fail?


Container Logs:
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,312735664+00:00 - rsyslog/run 
2020-07-30T11:57:00,312768364+00:00 - gunicorn/run 
2020-07-30T11:57:00,313017966+00:00 - iot-server/run 
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
2020-07-30T11:57:00,313969073+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,597835804+00:00 - iot-server/finish 1 0
2020-07-30T11:57:00,598826211+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 19.9.0
Listening at: http://127.0.0.1:31311 (10)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 41
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/home/mmlspark/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.ml.spark#mmlspark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156;1.0
    confs: [default]
    found com.microsoft.ml.spark#mmlspark_2.11;0.15 in spark-list
    found io.spray#spray-json_2.11;1.3.2 in central
    found com.microsoft.cntk#cntk;2.4 in central
    found org.openpnp#opencv;3.2.0-1 in central
    found com.jcraft#jsch;0.1.54 in central
    found org.apache.httpcomponents#httpclient;4.5.6 in central
    found org.apache.httpcomponents#httpcore;4.4.10 in central
    found commons-logging#commons-logging;1.2 in central
    found commons-codec#commons-codec;1.10 in central
    found com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 in central
:: resolution report :: resolve 318ms :: artifacts dl 11ms
    :: modules in use:
    com.jcraft#jsch;0.1.54 from central in [default]
    com.microsoft.cntk#cntk;2.4 from central in [default]
    com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 from central in [default]
    com.microsoft.ml.spark#mmlspark_2.11;0.15 from spark-list in [default]
    commons-codec#commons-codec;1.10 from central in [default]
    commons-logging#commons-logging;1.2 from central in [default]
    io.spray#spray-json_2.11;1.3.2 from central in [default]
    org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
    org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
    org.openpnp#opencv;3.2.0-1 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   10  |   0   |   0   |   0   ||   10  |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
    unknown resolver repo-1

    unknown resolver repo-1


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156
    confs: [default]
    0 artifacts copied, 10 already retrieved (0kB/7ms)
2020-07-30 11:57:02 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Initialized PySpark session.
Initializing logger
2020-07-30 11:57:09,464 | root | INFO | Starting up app insights client
Starting up app insights client
2020-07-30 11:57:09,464 | root | INFO | Starting up request id generator
Starting up request id generator
2020-07-30 11:57:09,464 | root | INFO | Starting up app insight hooks
Starting up app insight hooks
2020-07-30 11:57:09,464 | root | INFO | Invoking user's init function
Invoking user's init function
2020-07-30 11:57:19,652 | root | ERROR | User's init function failed
User's init function failed
2020-07-30 11:57:19,656 | root | ERROR | Encountered Exception Traceback (most recent call last):
  File "/var/azureml-server/aml_blueprint.py", line 163, in register
    main.init()
  File "/var/azureml-app/main.py", line 44, in init
    model = ALSModel.load(model_path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
    return cls.read().load(path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
    java_obj = self._jread.load(path)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
    at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
    at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)


Encountered Exception Traceback (most recent call last):
  File "/var/azureml-server/aml_blueprint.py", line 163, in register
    main.init()
  File "/var/azureml-app/main.py", line 44, in init
    model = ALSModel.load(model_path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
    return cls.read().load(path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
    java_obj = self._jread.load(path)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
    at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
    at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)


Worker exiting (pid: 41)
Shutting down: Master
Reason: Worker failed to boot.
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:19,833136837+00:00 - gunicorn/finish 3 0
2020-07-30T11:57:19,834216245+00:00 - Exit code 3 is not normal. Killing image.

---------------------------------------------------------------------------
WebserviceException                       Traceback (most recent call last)
<ipython-input-43-d0992ae9d1c9> in <module>
      6 local_service = Model.deploy(workspace, "recommend-service", [register_model], inference_config, deployment_config)
      7 
----> 8 local_service.wait_for_deployment()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in decorated(self, *args, **kwargs)
     69                 raise WebserviceException('Cannot call {}() when service is {}.'.format(func.__name__, self.state),
     70                                           logger=module_logger)
---> 71             return func(self, *args, **kwargs)
     72         return decorated
     73     return decorator

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in wait_for_deployment(self, show_output)
    601                                    self._container,
    602                                    health_url=self._internal_base_url,
--> 603                                    cleanup_if_failed=False)
    604 
    605             self.state = LocalWebservice.STATE_RUNNING

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in container_health_check(docker_port, container, health_url, cleanup_if_failed)
    745             # The container has started and crashed.
    746             _raise_for_container_failure(container, cleanup_if_failed,
--> 747                                          'Error: Container has crashed. Did your init method fail?')
    748 
    749         # The container hasn't crashed, so try to ping the health endpoint.

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in _raise_for_container_failure(container, cleanup, message)
   1258         cleanup_container(container)
   1259 
-> 1260     raise WebserviceException(message, logger=module_logger)
   1261 
   1262 

WebserviceException: WebserviceException:
    Message: Error: Container has crashed. Did your init method fail?
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Error: Container has crashed. Did your init method fail?"
    }
}

However, the ALSModel.load() works fine when executed in a Jupyter notebook.


Solution

  • A couple of things to check:

    1. Is your model registered in the workspace? AZUREML_MODEL_DIR only works for registered models. See this link for information about registering a model
    2. Are you specifying the same version of pyspark.ml.recommendation in your InferenceConfig as you use locally? This kind of error might be due to a difference in versions
    3. Have you looked at the output of print(service.get_logs())? Check out our troubleshoot and debugging documentation here for other things you can try