After training all the model, i am trying to rename each model prediction column to uniquely identify the model prediction inside the dataset.I am getting type mismatch error as specified below :
import org.apache.spark.ml.PredictionModel
import org.apache.spark.sql.DataFrame
val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
its output is given below:
models: Seq[(String, Any)] = List((NB,NaiveBayesModel (uid=nb_699528805899)
with 2 classes), (DT,()), (RF,RandomForestClassificationModel
(uid=rfc_403e93000cb6) with 10 trees), (GBM,GBTClassificationModel
(uid=gbtc_e778e2781d0b) with 20 trees))
def mlData(inputData: DataFrame, responseColumn: String, baseModels:
Seq[(String, PredictionModel[_, _])]): DataFrame= {
baseModels.map{ case(name, model) =>
model.transform(inputData)
.select("row_id", model.getPredictionCol )
.withColumnRenamed("prediction", s"${name}_prediction")
}.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))
.join(inputData.select("row_id", responseColumn), Seq("row_id"),
"inner")
}
its output is given below:
mlData: (inputData: org.apache.spark.sql.DataFrame, responseColumn: String, baseModels: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])])
org.apache.spark.sql.DataFrame
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
i am getting type mismatch error, that actually should not had occurred
<console>:102: error: type mismatch;
found : Seq[(String, Any)]
required: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])]
val mlTrainData= mlData(transferData, "value", models).drop("row_id")
Just based on the output it is clear that the second element in the DT
tuple is Unit
not a PredictionModel
- that's why whole object is Seq[(_, Any)]
and your code fails.
Since you don't provide context it is not clear how you get there.