When trying to zip the feature importance vector from lightGBM getfeatureImportances
to column names array, i ran into an error below:
import com.microsoft.ml.spark.LightGBMClassificationModel
import org.apache.spark.ml.classification.RandomForestClassificationModel
def getFeatureImportances(inputContainer: PipelineModelContainer): (String, String) = {
val transformer = inputContainer.pipelineModel.stages.last
val featureImportancesVector = inputContainer.params match {
case RandomForestParameters(numTrees, treeDepth, featureTransformer) =>
transformer.asInstanceOf[RandomForestClassificationModel].featureImportances
case LightGBMParameters(treeDepth, numLeaves, iterations, featureTransformer) =>
transformer.asInstanceOf[LightGBMClassificationModel].getFeatureImportances("split")
}
val colNames = inputContainer.featureColNames
val sortedFeatures = (colNames zip featureImportancesVector.toArray).sortWith(_._2 > _._2).zipWithIndex
}
I am getting this error with reference to the last line of my code:
value toArray is not a member of java.io.Serializable
Seems like the light GBM feature importances cannot be transformed to an array. This code works fine if its just the randomForestClassifier feature importance. What other things can i do?
In the two branches of the match
block,
one returns Array[Double]
,
another returns Vector
.
The common super type of the two types is java.io.Serializable
,
so Scala inferred the type of the variable featureImportancesVector
to that.
toArray
is not available in that type,
despite that the method exists in both cases.
To fix this is easy, as suggested in the comment, move the .toArray
to the featureImportances
,
so that the type of both branches, and thus the type of the variable, become Array[Double]
.