Search code examples
scalaapache-sparkmachine-learningdistributed-computing

Spark 2 logisticregression remove threshold


I'm using Spark 2 + Scala to train LogisticRegression based binary classification model and I'm using import org.apache.spark.ml.classification.LogisticRegression, which is the new ml API in Spark 2. However, when I evaluated the model by AUROC, I did not find a way to use the probability (double in 0-1) instead of binary classification (0/1). This was previously achieved by removeThreshold(), but in ml.LogisticRegression I did not find a similar method. Thus, is there a way to do that?

The evaluator I'm using is

val evaluator = new BinaryClassificationEvaluator()
  .setLabelCol("label")
  .setRawPredictionCol("rawPrediction")
  .setMetricName("areaUnderROC")
val auroc = evaluator.evaluate(predictions)`

Solution

  • if u want to get probability output other than 0/1 output, try this:

    import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, LogisticRegression}
    val lr = new LogisticRegression()
      .setMaxIter(100)
      .setRegParam(0.3)
    val lrModel = lr.fit(trainData)
    val summary = lrModel.summary
    summary.predictions.select("probability").show()