Search code examples
scalaapache-sparkmachine-learningartificial-intelligence

Exception while trying to explain model with MMLSpark's scala LIME library


I am trying to explain the predictions made by my XGboost model using MMLSparks Lime package for scala.

This is my first time using LIME library, I am able to perform a fit operation on the dataset and when I am trying to perform the transform operation, the program stops with an exception,

Caused by: java.lang.ClassCastException: org.apache.spark.ml.linalg.SparseVector cannot be cast to org.apache.spark.ml.linalg.DenseVector

I have around 200 features and many of them contain zero as its feature value.


Solution

  • You are likely using VectorAssembler to create your feature vector column. The transform function outputs a sparse vector if there are lots of zeros in your feature set to save computational space. This causes the error for LIME.

    More info on VectorAssembler output - Spark ML VectorAssembler returns strange output

    The solution is to convert the column back to a dense vector in order for mmlspark LIME to interpret.

    import org.apache.spark.sql.functions.udf
    import org.apache.spark.ml.linalg.Vector
    
    val asDense = udf((v: Vector) => v.toDense)
    
    featuresDF.withColumn("features", asDense(col("features")))
    

    Then you can fit your model.