Search code examples
pythonflaskpysparkpmml

Save PySpark Pipeline to PMML and Deploy It Using Flask (ERROR in app upon request)


I have been trying to find a way to deploy a trained PySpark pipeline as an API, and I ended up landing on both Flask and PMML as possible solutions.

As far as I am aware, the generation of the PMML file is working: I train the pipeline using ParamGridBuilder, obtain the best model, and spit it out as a .pmml file.

A problem arises, though, when I load the resulting file into Flask. I am able to get the API running just fine; however, when I send it a request, I am not getting the expected result (the sentiment contained in the text), but the following error.

[2020-03-02 17:05:15,831] ERROR in app: Exception on /sentiment_analysis [GET]
Traceback (most recent call last):
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/users/sentiment_analysis.py", line 59, in hello_world
    resultado = evaluator.evaluate(df)
  File "/home/users/.local/lib/python3.6/site-packages/jpmml_evaluator/__init__.py", line 80, in evaluate
    javaArguments = self.backend.dict2map(arguments)
  File "/home/users/.local/lib/python3.6/site-packages/jpmml_evaluator/pyjnius.py", line 31, in dict2map
    raise ValueError()
ValueError
127.0.0.1 - - [02/Mar/2020 17:05:15] "GET /sentiment_analysis?text=test HTTP/1.1" 500 -

Here are the versions of the involved software and packages:

  • Python 3.6.4
  • Spark 2.4.4
  • pyspark2pmml 0.5.1
  • jpmml_evaluator 0.2.3
  • Flask 1.1.1
  • pyspark 2.4.4

Also, below is the Python code I am using to load the model into Flask.

from flask import Flask, request
import pandas as pd
from jpmml_evaluator import make_evaluator, pyjnius

app = Flask('sentiment_analysis')

@app.route("/sentiment_analysis")
def hello_world():

    text = request.args.get('text')

    pyjnius.jnius_configure_classpath()

    backend = pyjnius.PyJNIusBackend()

    evaluator = make_evaluator(backend, "test.pmml") \
        .verify()

    df = pd.DataFrame(columns=["TWEET"], data=[[text]])

    result = evaluator.evaluate(df)

    sentiment = result.collect()[0]['prediction']

    if int(sentiment) == 0:
        sentiment = 'negative'
    else:
        sentiment = 'positive'

    return 'The sentiment is: ' + sentiment, 200

app.run(host='0.0.0.0', port=5001)

Does anyone know what's wrong here?


Solution

  • Your arguments DataFrame contains a complex column type; The Java backend that you have chosen (PyJNIus) does not know how to map this Python value to a Java value.

    Things you can try if you want to keep going down this roll-your-own Flask API way:

    • Update the jpmml_evaluator package to the latest. There were new value conversion added after 0.2.3. A newer package version should tell you exactly what's the problematic column type. See the source code of jpmml_evaluator.pyjnius.dict2map method.
    • Choose a different Java backend. Specifically, try replacing PyJNIus with Py4J.
    • Replace the complex column type with something simpler in your Python code.

    All things considered, you would be much better off serving your PySpark models using the Openscoring REST web service. There is an up-to-date tutorial available about deploying Apache Spark ML pipeline models as a REST web service.