Gunicorn with gevent: maintaining per request global data

I have a python application (built on MVC pattern) served by a Gunicorn server using asynchronous worker class (i.e. gevent). That means multiple clients requests are served simultaneously by a worker process. Every http request contains some data specific to that request like 'user_id'. Say an error occurred in a model and I want to log the error with the user_id. I don't want to keep passing the user_id (and some more request specific values) to every class or method. I want these values to be available globally for any code executed for this particular request. Controller on receiving the request sets these values and then any code executed for this request has access to these values. Code executing for multiple simultaneous requests should have access to their respective data values. Is it possible?

Solution

The general idea is to associate your per-request data with something that can be unique for each request. For instance have a dict with this unique identifier as keys and the per-request data as the value.

Since you say you are using gevent workers we can use greenlet.getcurrent() as the unique identifier.

This is pretty much what Flask + Werkzeug does but they do so in a much more performant, memory efficient, thread compatible and end user friendly way than my example below.

Here is a simple wsgi app to serve as an example. Here a is set and sourced on a per-request dict sourced via the 'globally available' function get_per_greenlet_dict. Whereas b is passed around as a parameter to serve as verification that a is correct.

# wsgi.py
import collections, logging, time, greenlet

logging.basicConfig()
log = logging.getLogger(__name__)
log.level = logging.DEBUG

# used to store per-request data
# keys are greenlets, values are dicts
storage = collections.defaultdict(dict)

# return a dict for this request
# TODO: remove the per-request dict at the end of the request
def get_per_greenlet_dict():
    return storage[greenlet.getcurrent()]

def application(env, start_response):

    # extract query vars
    query_vars = env['QUERY_STRING'].split("&")
    a = query_vars[0].split("=")[1]
    b = query_vars[1].split("=")[1]

    # store 'a' in our per-request dict
    get_per_greenlet_dict()['a'] = a

    log_a_and_b("Before sleep", b)
    time.sleep(1)
    log_a_and_b("After sleep", b)

    start_response('200 OK', [('Content-Type', 'text/html')])
    return [b"OK: "]


def log_a_and_b(prefix, b):
    # log both a and b,
    # where a is sourced from our per-request dict
    # and b is passed as a parameter as a means of verifying a
    a = get_per_greenlet_dict()['a']
    log.debug(prefix + "; a:%s b:%s", a, b)

Run the gunicorn server with gevent workers:

$ gunicorn -k gevent wsgi

Run multiple simutaneous requests, say by:

$ for i in `seq 1 5`; do curl "127.0.0.1:8000?a=$i&b=$i" & done

Then you will see output from gunicorn like:

DEBUG:wsgi:Before sleep; a:2 b:2
DEBUG:wsgi:Before sleep; a:5 b:5
DEBUG:wsgi:Before sleep; a:4 b:4
DEBUG:wsgi:Before sleep; a:1 b:1
DEBUG:wsgi:Before sleep; a:3 b:3
DEBUG:wsgi:After sleep; a:2 b:2
DEBUG:wsgi:After sleep; a:5 b:5
DEBUG:wsgi:After sleep; a:4 b:4
DEBUG:wsgi:After sleep; a:1 b:1
DEBUG:wsgi:After sleep; a:3 b:3