Search code examples
apache-sparkapache-spark-sqlapache-spark-mllibmleap

Unable to serialize a apache spark transformer in mleap


I use Spark 2.1.0 and Scala 2.11.8.

I am trying to build a twitter sentiment analysis model in apache spark and service it using MLeap.

When I am running the model without using mleap, things work smoothly. Problem happens only when I try to save the model in mleap's serialization format so I can serve the model later using mleap.

Here is the line with throws the error -

val modelSavePath = "/tmp/sampleapp/model-mleap/" 

val pipelineConfig = json.get("PipelineConfig").get.asInstanceOf[Map[String, Any]]
val loaderConfig = json.get("LoaderConfig").get.asInstanceOf[Map[String, Any]]
val loaderPath = loaderConfig
    .get("DataLocation")
    .get
    .asInstanceOf[String]
var data = sqlContext.read.format("com.databricks.spark.csv").
                 option("header", "true").
                 option("delimiter", "\t").
                 option("inferSchema", "true").
                 load(loaderPath)

val pipeline = Pipeline(pipelineConfig)

val model = pipeline.fit(data)
val mleapPipeline: Transformer = model

I get java.util.NoSuchElementException: key not found: org.apache.spark.ml.feature.Tokenizer in the last line.

When I did a quick search I found out that mleap does not support all the transformers. But I was not able to find an exhaustive list.

How do I find out if the transformers that I am using are actually not supported or there is some other error.


Solution

  • I am one of the creators of MLeap, and we do support Tokenizer! I am curious, which version of MLeap are you trying to use? I think you may be looking at an outdated codebase from TrueCar, check out our new codebase here:

    https://github.com/combust/mleap

    We also have fairly complete documentation here, including a full list of supported transformers:

    Documentation: http://mleap-docs.combust.ml/

    Transformer List: http://mleap-docs.combust.ml/core-concepts/transformers/support.html

    I hope this helps, and if things still aren't working, file an issue in github and we can help you debug it from there.