Search code examples
apache-sparkpysparksvmapache-spark-mllibapache-spark-ml

PySpark MLlib: AssertionError: Classifier doesn't extend from HasRawPredictionCol


I am a newbie in Spark . I want to use multiclass classification for SVM in PySpark MLlib. I installed Spark 2.3.0 on Windows.

But I searched and found that SVM is implemented for binary classification only in Spark , so we have to use one-vs-all strategy. It gave me an error when I tried to use one-vs-all with SVM . I searched for the error but do not find a solution for it.

I used the code of one-vs-all from this link https://spark.apache.org/docs/2.1.0/ml-classification-regression.html#one-vs-rest-classifier-aka-one-vs-all

here is my code :

        from pyspark.mllib.classification import SVMWithSGD , SVMModel
        from pyspark.ml.classification import OneVsRest
        # instantiate the One Vs Rest Classifier.
        svm_model = SVMWithSGD()
        ovr = OneVsRest(classifier=svm_model)
        # train the multiclass model.
        ovrModel = ovr.fit(rdd_train)
        # score the model on test data.
        predictions = ovrModel.transform(rdd_test)

The error is in the line "ovr.fit(rdd_train)". Here is the error

  File "D:/Mycode-newtrials - Copy/stance_detection -norelieff-lgbm - randomizedsearch - modified - spark.py", line 1460, in computescores
ovrModel = ovr.fit(rdd_train)
  File "D:\python27\lib\site-packages\pyspark\ml\base.py", line 132, in fit
return self._fit(dataset)
  File "D:\python27\lib\site-packages\pyspark\ml\classification.py", line 1758, in _fit
"Classifier %s doesn't extend from HasRawPredictionCol." % type(classifier)
 AssertionError: Classifier <class 'pyspark.mllib.classification.SVMWithSGD'> doesn't extend from HasRawPredictionCol.

Solution

  • You get the error because you are trying to use a model from Spark ML (OneVsRest) with a base binary classifier from Spark MLlib (SVMWithSGD).

    Spark MLlib (the old, RDD-based API) and Spark ML (the new, dataframe-based API) are not only different libraries, but they are also incompatible: you cannot mix models between them (looking closer at the examples, you'll see that they import the base classifier from pyspark.ml, and not from pyspark.mllib, as you are trying to do here).

    Unfortunately, as at the time of writing (Spark 2.3) Spark ML does not include SVMs, you cannot currently use the algorithm as a base classifier with OneVsRest...