Search code examples
ubuntuflasknltktextblob

NLTK+TextBlob in flask/nginx/gunicorn on Ubuntu 500 error


I am trying to run noun phrase analysis in a Flask app running on Ubuntu, served through gunicorn and nginx. I am getting an error 500 with no (apparent) logging of the error occurring either in nginx, supervisor, or unicorn error logs. Nor does 'supervisorctl tail app' shed any light.

My sites-available nginx.conf:

server {
    listen 80;
    server_name [domain redacted];
    charset utf-8;
    client_max_body_size 75M;

    access_log /var/log/nginx/nginx_access.log;
    error_log /var/log/nginx/nginx_error.log;

    location / { try_files $uri @app; }

    location @app {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

My supervisor app.conf

[program:app]
command = gunicorn app:app -b localhost:8000
directory = /home/www/app
user = admin

I am running my app in app.py with the following (issue experienced with DEBUG = False and True in config.py)

app = Flask(__name__, static_folder='static', static_url_path='/static')
app.config.from_pyfile('config.py')

if __name__ == '__main__':
        app.run()
        if not app.debug:
            stream_handler = logging.StreamHandler()
            stream_handler.setLevel(logging.INFO)
            app.logger.addHandler(stream_handler)

Config.py is simply

DEBUG = False
ALLOWED_HOSTS=['*']

The noun phrases function I am calling

from textblob import TextBlob

def generateNounPhrases(input):
    blob = TextBlob(input)
    np = blob.noun_phrases

    return np

The app.py flask route for the page, passing the output of generateNounPhrases()

@app.route('/thread', methods=['GET'])
def thread():
    ...
    nounphrases = generateNounPhrases(text_to_analyze)   
    ...

    return render_template("Thread.html", nounphrases=nounphrases)

I am absolutely lost and am an absolute novice at this. Any guidance would be tremendous!


Solution

  • The sudo user = admin declared in the app.conf supervisor file, created to run this app, was not able to read at the site root level. The inaccessible NLTK corpora downloaded at a /root/nltk_data were causing my original 500...

    I discovered this problem after having re-configured gunicorn logging, and receiving fatal supervisor crashes on supervisorctl restart app for the newly pointed gunicorn.log not having permissions to write.

    My updated and working supervisor config, sans user declaration, is as follows:

    [program:app]
    command = gunicorn app:app -b localhost:8000 --log-file /var/log/gunicorn/gunicorn_log.log
    directory = /home/www/app
    stdout_logfile=/var/log/supervisor/supervisor_stdout.log
    stderr_logfile=/var/log/supervisor/supervisor_stderr.log
    

    I am not sure what the full security implications are for this configuration, however, and not sure why the sudo group admin user was not accessing the directories correctly. Bonus points to anyone with that answer.