I am trying to explain the predictions made by my XGboost model using MMLSparks Lime package for scala.
This is my first time using LIME library, I am able to perform a fit operation on the dataset and when I am trying to perform the transform operation, the program stops with an exception,
Caused by: java.lang.ClassCastException: org.apache.spark.ml.linalg.SparseVector cannot be cast to org.apache.spark.ml.linalg.DenseVector
I have around 200 features and many of them contain zero as its feature value.
You are likely using VectorAssembler to create your feature vector column. The transform function outputs a sparse vector if there are lots of zeros in your feature set to save computational space. This causes the error for LIME.
More info on VectorAssembler output - Spark ML VectorAssembler returns strange output
The solution is to convert the column back to a dense vector in order for mmlspark LIME to interpret.
import org.apache.spark.sql.functions.udf
import org.apache.spark.ml.linalg.Vector
val asDense = udf((v: Vector) => v.toDense)
featuresDF.withColumn("features", asDense(col("features")))
Then you can fit your model.