Search code examples
pythonapache-sparkpysparkpmml

How can I use pmml model in PySpark script?


I have xgboost model, which was trained on pure Python and converted to pmml format. Now I need to use this model in PySpark script, but I out of ideas, how can I realize it. Are there methods that allow import pmml model in Python and use it for predict? Thanks for any suggestions.

BR,
Vladimir


Solution

  • Spark does not support importing from PMML directly. While I have not encountered a pyspark PMML importer there is one for java (https://github.com/jpmml/jpmml-evaluator-spark). What you can do is wrap the java (or scala) so you can access it from python (e.g. see http://aseigneurin.github.io/2016/09/01/spark-calling-scala-code-from-pyspark.html).