I use Spark 2.1.0 and Scala 2.11.8.
I am trying to build a twitter sentiment analysis model in apache spark and service it using MLeap.
When I am running the model without using mleap, things work smoothly. Problem happens only when I try to save the model in mleap's serialization format so I can serve the model later using mleap.
Here is the line with throws the error -
val modelSavePath = "/tmp/sampleapp/model-mleap/"
val pipelineConfig = json.get("PipelineConfig").get.asInstanceOf[Map[String, Any]]
val loaderConfig = json.get("LoaderConfig").get.asInstanceOf[Map[String, Any]]
val loaderPath = loaderConfig
.get("DataLocation")
.get
.asInstanceOf[String]
var data = sqlContext.read.format("com.databricks.spark.csv").
option("header", "true").
option("delimiter", "\t").
option("inferSchema", "true").
load(loaderPath)
val pipeline = Pipeline(pipelineConfig)
val model = pipeline.fit(data)
val mleapPipeline: Transformer = model
I get java.util.NoSuchElementException: key not found: org.apache.spark.ml.feature.Tokenizer in the last line.
When I did a quick search I found out that mleap does not support all the transformers. But I was not able to find an exhaustive list.
How do I find out if the transformers that I am using are actually not supported or there is some other error.
I am one of the creators of MLeap, and we do support Tokenizer! I am curious, which version of MLeap are you trying to use? I think you may be looking at an outdated codebase from TrueCar, check out our new codebase here:
https://github.com/combust/mleap
We also have fairly complete documentation here, including a full list of supported transformers:
Documentation: http://mleap-docs.combust.ml/
Transformer List: http://mleap-docs.combust.ml/core-concepts/transformers/support.html
I hope this helps, and if things still aren't working, file an issue in github and we can help you debug it from there.