Search code examples
apache-sparkpysparkapache-spark-mllibapache-spark-ml

Export models as PMML using PySpark


Is it possible to export models as PMMLs using PySpark? I know this is possible using Spark. But I did not find any reference in PySpark docs. So does this mean that if I want to do this, I need to write custom code using some third party python PMML library?


Solution

  • It is possible to export Apache Spark pipelines to PMML using the JPMML-SparkML library. Furthermore, this library is made available for end users in the form of a "Spark Package" by the JPMML-SparkML-Package project.

    Example PySpark code:

    from jpmml_sparkml import toPMMLBytes
    pmmlBytes = toPMMLBytes(sc, df, pipelineModel)
    print(pmmlBytes)