Search code examples
pythonpython-2.7flaskflask-cache

Flask - caching result of data load


I'm writing a server-side application in flask / python and have an issue with some data that has to be loaded for calculation. Loading the data (about 40 MB) takes much longer that processing the server response, and the data never changes, so I want it to be loaded only once, effectively when apache starts up. But no matter what I try, it keeps reloading each time a request comes in and massively slowing things down. I can tell by the print statement shown below, which writes to the apache logs for each request. I want to load the data, and so write that line to the logs, only once on startup.

Interestingly, this only happens when the script is run via apache on WSGI - if I run it locally using python from the command line, the data load only happens once, and server responses are much faster.

Any thoughts?

My most recent attempt, using flask_cache, is like this:

@cache.cached(key_prefix = 'my_key')
def load_huge_file():
 #Do some things and assign data from a large file to loaded_data
 print "Huge data set loaded!"
 return loaded_data

shared_data = load_huge_file()

@app.route("/user_input")
def user_response():
 global shared_data
 return fairly_quick_function(args, shared_data)

Edit - Thanks - using before_first_request and adding "WSGIDaemonProcess myApp processes=1" to my WSGI config did the trick. Now it keeps the process running and just spins new requests off of it, instead of re running the init each time.


Solution

  • You'll have to load this once per process; how many times that is depends on how you configured WSGI.

    Do not use Flask-Cache here; it cannot guarantee that the data remains loaded (it'll promise that the data is loaded for a maximum amount of time, never a minimum).

    You could load the data with a app.before_first_request() handler:

    @app.before_first_request
    def load_huge_file():
        #Do some things and assign data from a large file to loaded_data
        print "Huge data set loaded!"
        global shared_data
        shared_data = loaded_data
    

    but loading it when the module is imported should be fine too unless you are running this with the Flask development server in reload mode. You were already doing this at import time, but the @cache.cached() decorator is not going to help here as it'll kill your other cached data.

    If you see the data loaded on each request then your WSGI configuration is wrong; it is Apache that creates a new process for each request. Adjust your WSGI setup to use daemon mode (with WSGIDaemonProcess); that way you disconnect creating processes from Apache's process and can keep this data around for (much) longer.

    Also see the Flask deployment on mod_wsgi documentation.