Search code examples
pysparkapache-spark-ml

NoSuchMethodException: org.apache.spark.ml.classification.GBTClassificationModel in Pyspark model load


I have trained a model in pyspark

##Model
gbt = GBTClassifier(maxIter=10)
gbtModel = gbt.fit(train)
predictions = gbtModel.transform(test)

Here I am saving pipeline and model

#Save pipeline
pipelineModel.write().overwrite().save("s3://data-production/pipelineModel_v1")

#Save Model
gbtModel.save("s3://data-production/first_trade.model_v0")

Now in production /future datasets, I am loading pipeline and model

pipelineModel = PipelineModel.load("s3://data-production/pipelineModel_v1")

new_test= pipelineModel.transform(new_df1)

model = GBTClassifier.load("s3://data-production/first_trade.model_v0")

I am getting this error after model load

Py4JJavaError: An error occurred while calling o4701.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.classification.GBTClassificationModel.<init>(java.lang.String)
    at java.lang.Class.getConstructor0(Class.java:3082)
    at java.lang.Class.getConstructor(Class.java:1825)
    at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:496)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o4701.load.\n', JavaObject id=o4702), <traceback object at 0x7f247a3eb9c8>)

Solution

  • The saved model is essentially a serialized version of your trained GBTClassifier. To deserialize the model you would need the original classes in the production code as well. Add this line to the set of import statements.

    from pyspark.ml.classification import GBTClassifier, GBTClassificationModel