I'm trying to build a prediction tool using SK-Learn's Naive Bayes Classifier and the Python Flask micro-framework. From what I have googled, I can pickle the model and then unpickle the model when I load the app on the browser, but how can I do that exactly?
My app should receive user input values, then pass these values to the model, and then display the predicted values back to the users (as d3 graphs, thus the need to convert the predicted values into JSON format).
This is what I've tried so far:
Pickling the model
from sklearn.naive_bayes import GaussianNB
import numpy as np
import csv
def loadCsv(filename):
lines = csv.reader(open(filename,"rb"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
datasetX = loadCsv("pollutants.csv")
datasetY = loadCsv("acute_bronchitis.csv")
X = np.array(datasetX)
Y = np.array(datasetY).ravel()
model = GaussianNB()
model.fit(X,Y)
#import pickle
from sklearn.externals import joblib
joblib.dump(model,'acute_bronchitis.pkl')
The HTML form to collect user input:
<form class = "prediction-options" method = "post" action = "/prediction/results">
<input type = "range" class = "prediction-option" name = "aqi" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">AQI</label>
<input type = "range" class = "prediction-option" name = "pm2_5" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">PM2.5</label>
<input type = "range" class = "prediction-option" name = "pm10" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">PM10</label>
<input type = "range" class = "prediction-option" name = "so2" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">SO2</label>
<input type = "range" class = "prediction-option" name = "no2" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">NO2</label>
<input type = "range" class = "prediction-option" name = "co" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">CO</label>
<input type = "range" class = "prediction-option" name = "o3" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">O3</label>
<input type = "submit" class = "submit-prediction-options" value = "Get Patient Estimates" />
</form>
The Python Flask app.py
:
from flask import Flask, render_template, request
import json
from sklearn.naive_bayes import GaussianNB
import numpy as np
import pickle as pkl
from sklearn.externals import joblib
model_acute_bronchitis = pkl.load(open('data/acute_bronchitis.pkl'))
@app.route("/prediction/results", methods = ['POST'])
def predict():
input_aqi = request.form['aqi']
input_pm2_5 = request.form['pm2_5']
input_pm10 = request.form['pm10']
input_so2 = request.form['so2']
input_no2 = request.form['no2']
input_co = request.form['co']
input_o3 = request.form['o3']
input_list = [[input_aqi,input_pm2_5,input_pm10,input_so2,input_no2,input_co,input_o3]]
output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
prediction = json.dumps(output_acute_bronchitis)
return prediction
However, I got the following error message:TypeError: 'NDArrayWrapper' object does not support indexing
which I found might be caused by using sk-learn's joblib to pickle the model.
So, I tried to see if I could use joblib's load function to load the model in Flask instead, and I got this error message:
/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
[2016-07-27 12:45:30,747] ERROR in app: Exception on /prediction/results [POST]
Traceback (most recent call last):
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app
response = self.full_dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "app.py", line 95, in predict
output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 65, in predict
jll = self._joint_log_likelihood(X)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 394, in _joint_log_likelihood
n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
127.0.0.1 - - [27/Jul/2016 12:45:30] "POST /prediction/results HTTP/1.1" 500 -
What am I doing wrong? Are there simpler alternatives to do what I hope to achieve?
I think the problem with your code is that the data from your form is being read as a string. For example, in input_aqi = request.form['aqi']
, input_aqi
has a string. Therefore, in output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
, you end up passing predict
an array of strings due to which you see this error. You can fix this by simply converting all your form inputs to floats as follows:
input_aqi = float(request.form['aqi'])
You will have to do this for all the form inputs that you are putting in input_list
.
Hope that helps.