Search code examples
javaapache-sparkmachine-learningdata-scienceapache-spark-ml

Retrieve categories name from the predictions column in the result table of the ml model


I have developed a ml model (logistic regression model), using spark 2.4.3 and java, that predicts the WorkType(labels) of the email based on the keywords of the subject(input) of that email. I used training data to train the model, and used it on testing data as follows:

        LogisticRegressionModel lrModel = lr.fit(training);


        Dataset<Row> result = lrModel.transform(testing);

       result.select("WorkType","Subject","probability","label","prediction")
                .orderBy(org.apache.spark.sql.functions.col("probability").desc())
                .show(100, 30);

The results I got was follows:

+------------------------+------------------------------+------------------------------+-----+----------+
|                WorkType|                       Subject|                   probability|label|prediction|
+------------------------+------------------------------+------------------------------+-----+----------+
|            Cancellation|Automatic reply: Ticket #72...|[0.8562867173211978,0.02423...|  0.0|       0.0|
|            Cancellation|Ticket #72827 Cancelling Po...|[0.8244896056944511,0.03953...|  0.0|       0.0|
|            Cancellation|Ticket #72827 Cancelling Po...|[0.8127553003889683,0.04411...|  0.0|       0.0|
|            Cancellation|Ticket #72616 Daily Cancell...|[0.8115900852592474,0.03392...|  0.0|       0.0|

To train the model, the worktype was converted to labels, now can we convert the predictions column in the results such that it will give the workType string as output ? Please, help me. Thanks!


Solution

  • If you are using LabelEncoder to convert labels, using le.inverse_transform([0.0]) you get strings back