I am trying to implement Multiclass classification using pySpark, I have spent loads of time searching the web, and I have read that it is possible now using Spark 2.1.0.
I have generated my own dataset with all-numerical features and I have created a DataFrame as shown below;
I have three classes 'Service_Level' which are either 0, 1 or 2.
Questions:
Thanks.
Since there was no answer, I will share what I observed during research. using Labeled Points is ok when using the Spark MLlib
which is now in maintenance mode in Spark 2.1.0. However, my features were categorical hence using the DataFrame API with Spark ML
, I had to convert them to vectors using StringIndexer, OneHotEncoder and Pipelines to select my features and labels.
Answering the question
Yes, Labeled Points can be used with those features but when using Spark MLlib. I was not able to implement the Multilayer Perceptron because somehow it required libsvm
formatted data which I did not have and could not convert my CSV into such.
In the final implementation, I had to use the Dataframe based API Spark ml