I'm using Spark 2 + Scala to train LogisticRegression based binary classification model and I'm using import org.apache.spark.ml.classification.LogisticRegression
, which is the new ml API in Spark 2. However, when I evaluated the model by AUROC, I did not find a way to use the probability (double in 0-1) instead of binary classification (0/1). This was previously achieved by removeThreshold()
, but in ml.LogisticRegression
I did not find a similar method. Thus, is there a way to do that?
The evaluator I'm using is
val evaluator = new BinaryClassificationEvaluator()
.setLabelCol("label")
.setRawPredictionCol("rawPrediction")
.setMetricName("areaUnderROC")
val auroc = evaluator.evaluate(predictions)`
if u want to get probability output other than 0/1 output, try this:
import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, LogisticRegression}
val lr = new LogisticRegression()
.setMaxIter(100)
.setRegParam(0.3)
val lrModel = lr.fit(trainData)
val summary = lrModel.summary
summary.predictions.select("probability").show()