I am a newbie in Spark . I want to use multiclass classification for SVM in PySpark MLlib. I installed Spark 2.3.0 on Windows.
But I searched and found that SVM is implemented for binary classification only in Spark , so we have to use one-vs-all strategy. It gave me an error when I tried to use one-vs-all with SVM . I searched for the error but do not find a solution for it.
I used the code of one-vs-all from this link https://spark.apache.org/docs/2.1.0/ml-classification-regression.html#one-vs-rest-classifier-aka-one-vs-all
here is my code :
from pyspark.mllib.classification import SVMWithSGD , SVMModel
from pyspark.ml.classification import OneVsRest
# instantiate the One Vs Rest Classifier.
svm_model = SVMWithSGD()
ovr = OneVsRest(classifier=svm_model)
# train the multiclass model.
ovrModel = ovr.fit(rdd_train)
# score the model on test data.
predictions = ovrModel.transform(rdd_test)
The error is in the line "ovr.fit(rdd_train)". Here is the error
File "D:/Mycode-newtrials - Copy/stance_detection -norelieff-lgbm - randomizedsearch - modified - spark.py", line 1460, in computescores
ovrModel = ovr.fit(rdd_train)
File "D:\python27\lib\site-packages\pyspark\ml\base.py", line 132, in fit
return self._fit(dataset)
File "D:\python27\lib\site-packages\pyspark\ml\classification.py", line 1758, in _fit
"Classifier %s doesn't extend from HasRawPredictionCol." % type(classifier)
AssertionError: Classifier <class 'pyspark.mllib.classification.SVMWithSGD'> doesn't extend from HasRawPredictionCol.
You get the error because you are trying to use a model from Spark ML (OneVsRest
) with a base binary classifier from Spark MLlib (SVMWithSGD
).
Spark MLlib (the old, RDD-based API) and Spark ML (the new, dataframe-based API) are not only different libraries, but they are also incompatible: you cannot mix models between them (looking closer at the examples, you'll see that they import the base classifier from pyspark.ml
, and not from pyspark.mllib
, as you are trying to do here).
Unfortunately, as at the time of writing (Spark 2.3) Spark ML does not include SVMs, you cannot currently use the algorithm as a base classifier with OneVsRest
...