Search code examples
pythonscalaapache-sparkapache-spark-ml

Converting Python to Scala in Spark ML?


The question is about Logistic regression with spark ml (data frames)

When I want to change the code Python to Scala

Python:

[stage.coefficients for stage in model.stages
    if isinstance(stage, LogisticRegressionModel)]

Scala:(changed)

   for (stage<-model.stages){
        if(stage.isInstanceOf[LogisticRegressionModel]{
            val a = Array(stage.coefficients)
    }}

I have already checked stage.isInstanceOf[LogisticRegressionModel], which returned the True. However, stage.coefficients has the error message. It says "value coefficients is not a member of org.apache.spark.ml.Transformer".

I only check the stage, it will return

org.apache.spark.ml.Transformer= logreg 382456482

Why the type is different when the isInstanceOf returns true? What should I do? Thanks


Solution

  • Why the type is different when the isInstanceOf returns true?

    Well, Scala is a statically typed language and stages is an Array[Transformer] so each element you access is a Transformer. Transformers in general have no coefficients, hence the error.

    What should I do?

    Be specific about the types.

    import org.apache.spark.ml.classification.LogisticRegressionModel
    
    model.stages.collect { 
      case lr: LogisticRegressionModel => lr.coefficients
    }.headOption