I have developed a ml model (logistic regression model), using spark 2.4.3 and java, that predicts the WorkType(labels) of the email based on the keywords of the subject(input) of that email. I used training data to train the model, and used it on testing data as follows:
LogisticRegressionModel lrModel = lr.fit(training);
Dataset<Row> result = lrModel.transform(testing);
result.select("WorkType","Subject","probability","label","prediction")
.orderBy(org.apache.spark.sql.functions.col("probability").desc())
.show(100, 30);
The results I got was follows:
+------------------------+------------------------------+------------------------------+-----+----------+
| WorkType| Subject| probability|label|prediction|
+------------------------+------------------------------+------------------------------+-----+----------+
| Cancellation|Automatic reply: Ticket #72...|[0.8562867173211978,0.02423...| 0.0| 0.0|
| Cancellation|Ticket #72827 Cancelling Po...|[0.8244896056944511,0.03953...| 0.0| 0.0|
| Cancellation|Ticket #72827 Cancelling Po...|[0.8127553003889683,0.04411...| 0.0| 0.0|
| Cancellation|Ticket #72616 Daily Cancell...|[0.8115900852592474,0.03392...| 0.0| 0.0|
To train the model, the worktype was converted to labels, now can we convert the predictions column in the results such that it will give the workType string as output ? Please, help me. Thanks!
If you are using LabelEncoder to convert labels, using le.inverse_transform([0.0]) you get strings back