Search code examples
pythonpysparksvm

Problem while reading svm model in pyspark


I'm new with pyspark, I just saved my LinearSVC model in a folder called "svm.model". I got 2 folders: data and metadata.

Now I'm trying to load the model. This is my code to load the model:

# Spark environment
from pyspark.sql import SparkSession
from pyspark.ml.classification import LinearSVC

spark = SparkSession.builder.getOrCreate()
# read model
lsvc = LinearSVC(maxIter=10, regParam=0.1)
samemodel = lsvc.load("svm.model/")

But when loading the model I get this error:

File "C:/Users/Ayoub/PycharmProjects/sparkdemo/validation.py", line 9, in <module>
    samemodel = lsvc.load("svm.model/")
  File "E:\spark-3.0.1-bin-hadoop2.7\python\pyspark\ml\util.py", line 330, in load
    return cls.read().load(path)
  File "E:\spark-3.0.1-bin-hadoop2.7\python\pyspark\ml\util.py", line 280, in load
    java_obj = self._jread.load(path)
  File "E:\spark-3.0.1-bin-hadoop2.7\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "E:\spark-3.0.1-bin-hadoop2.7\python\pyspark\sql\utils.py", line 128, in deco
    return f(*a, **kw)
  File "E:\spark-3.0.1-bin-hadoop2.7\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o24.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.classification.LinearSVCModel.<init>(java.lang.String)
    at java.lang.Class.getConstructor0(Unknown Source)
    at java.lang.Class.getConstructor(Unknown Source)
    at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Unknown Source)
20/11/19 13:22:31 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped

I'm not sure what this means, this is the first time I try to save and load a model with pyspark. I wonder If there's something wrong in my model folder "svm.model" or in my load method ...!?


Solution

  • I was using the wrong class to load the module. The following code works:

    from pyspark.ml.classification import LinearSVCModel
    
    samemodel = LinearSVCModel.load(model_path)
    

    So to train the model we use LinearSVC, and to load it we use LinearSVCModel