Search code examples
apache-sparkpysparkapache-spark-ml

How can I operationalize a SparkML model as a real-time webservice?


Once a SparkML model has been trained on a Spark cluster, how can I take the trained model and make it available for scoring through a restful API?

The problem is that it requires a SparkContext in order to be loaded, but is there a way to 'fake it' since it does not seem really necessary, or what is the minimum required to create a SparkContext?


Solution

  • In some cases - yes, it can.

    Many models in Spark can be exported to JPMML, standarized format for ML models. Then you can use it with other Java library like https://github.com/jpmml/jpmml-sparkml

    How to export you can read in this question - Spark ml and PMML export.

    You can also use Spark Streaming to calculate values, however it will have higher latency until Continous Processing Mode being available

    For very time-consuming calculations, such as recommendation algorithms, it's I think quite normal to pre-calculate values and save in database like Cassandra