Search code examples
scalaapache-sparkapache-spark-mlliblogistic-regression

Non working Spark example in Scala, LogisticRegressionTrainingSummary


I tried to implement this example for multinomial logistic regression, but it doesn't recognize features that are being used. Probably some version mismatch. This part of code:

trainingSummary.falsePositiveRateByLabel.zipWithIndex.foreach { case (rate, label) =>
  println(s"label $label: $rate")
}

None of the members of LogisticRegressionTrainingSummary are being recognized, falsePositiveRateByLabel particulary in the given example. As well as other members later in code: truePositiveRateByLabel, precisionByLabel,...

When I go to implementation I can't find any similar members that I could use instead, I use mllib 2.11. What am I missing?


Solution

  • You are correct, this is a versioning issue. The github code example you have given is for the current master branch of Spark where there has been some major changes in this part of the API.

    What you have been following is what code in Spark 2.3 will look like. However, at this time, this version is not yet stable and available for download. This is what the version 2.2 branch of the same code example looks like:

    val training = spark
      .read
      .format("libsvm")
      .load("data/mllib/sample_multiclass_classification_data.txt")
    
    val lr = new LogisticRegression()
      .setMaxIter(10)
      .setRegParam(0.3)
      .setElasticNetParam(0.8)
    
    // Fit the model
    val lrModel = lr.fit(training)
    
    // Print the coefficients and intercept for multinomial logistic regression
    println(s"Coefficients: \n${lrModel.coefficientMatrix}")
    println(s"Intercepts: ${lrModel.interceptVector}")
    // $example off$
    
    spark.stop()
    

    In other words, the methods you are trying to use are not yet implemented in your Spark version.