python-3.x nginx flask docker-compose joblib

Docker uwsgi-nginx-flask with joblib, unable to find local function, but works in standalone flask

When I tried to load a pre-trained model thru joblib inside a docker container getting following error.

web_1  | 2018-02-06 15:11:50,826 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
web_1  | 2018-02-06 15:11:50,828 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
web_1  | Traceback (most recent call last):
web_1  |   File "./app/main.py", line 23, in <module>
web_1  |     svm_detector_reloaded=joblib.load(filename);
web_1  |   File "/usr/local/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
web_1  |     obj = _unpickle(fobj, filename, mmap_mode)
web_1  |   File "/usr/local/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
web_1  |     obj = unpickler.load()
web_1  |   File "/usr/local/lib/python3.6/pickle.py", line 1050, in load
web_1  |     dispatch[key[0]](self)
web_1  |   File "/usr/local/lib/python3.6/pickle.py", line 1338, in load_global
web_1  |     klass = self.find_class(module, name)
web_1  |   File "/usr/local/lib/python3.6/pickle.py", line 1392, in find_class
web_1  |     return getattr(sys.modules[module], name)
web_1  | AttributeError: module '__main__' has no attribute 'split_into_lemmas'
web_1  | unable to load app 0 (mountpoint='') (callable not found or import error)
web_1  | *** no app loaded. going in full dynamic mode ***
web_1  | *** uWSGI is running in multiple interpreter mode ***

My main.py looks like

from flask import Flask
from flask import request
from flask import jsonify
from textblob import TextBlob
import sklearn
import numpy as np
from sklearn.externals import joblib

app = Flask(__name__)

from .api.utils import split_into_lemmas as split_into_lemmas

def split_into_lemmas(message):
    message=message.lower()
    words = TextBlob(message).words
    # for each word, take its "base form" = lemma 
    return [word.lemma for word in words]

def tollower(message):
    return message.lower()

filename = '../../data/sms_spam_detector.pkl'
svm_detector_reloaded=joblib.load(filename);

text="Testing"
lowerText=tollower(text)

@app.route('/')
def hello():
    return tollower("Test Test ");

@app.route('/detect/')
def route_detect():
    SMS=request.args.get('SMS')
    if(SMS==None or SMS==''):
        SMS="Test";
    return tollower(SMS);
#    test=[SMS]
#    message=  ( svm_detector_reloaded.predict(test)[0])
#    return SMS+"    "+message;

if __name__ == "__main__":
    # Only for debugging while developing
    app.run(host='0.0.0.0')

Basically I downloaded the example-flask-package-python3.6.zip from tiangolo/uwsgi-nginx-flask. Added data director and modified the docker file and main.py. main.py is pasted above and docker file looks like

FROM tiangolo/uwsgi-nginx-flask:python3.6
ENV LISTEN_PORT 8080

EXPOSE 8080 
RUN pip3 install numpy TextBlob scikit-learn scipy

COPY ./app /app
COPY ./data /data

Then I copied prebuilt model (stored via joblib) to the newly created data directory. Entire code works perfectly ok, if I directly run the code like python main.py, but not when issue docker-compose up command, getting above error. If I comment the line svm_detector_reloaded=joblib.load(filename);, docker comes up and everything works, except for that machine learning part.

Basically, the defined function split_into_lemmas is not accessible inside unpickled model.

What am I doing wrong here? Model was built by following the steps mentioned @ http://radimrehurek.com/data_science_python. Actual model is built at step 6.

Solution

Ok. I am able to resolve it. I got a clue from 3614379. I first created the function split_into_lemmas in a module (or in a .py file) and imported that module while training, instead of keeping the function in main file itself. Then in my docker instance also I imported the same module. It resolved the issue.