python performance flask gunicorn locust

Unable to scale Gunicorn/Flask HelloWorld over 125 RPS

I have a Flask app that I have been unable to scale past 125 RPS locally. It is a simple 'hello world' as seen below.

I'm using the Locust.io load testing tool. I have pointed the same load test to a local Golang hello world, and am able to get into 1000's of RPS. IMHO this rules out my Locust and OS configurations as potential bottlenecks.

I'm using 17 workers as my machine has 8 cores ((2*CPU)+1 is recommended by Gunicorn docs)

From what I've read, using the gevent worker type for Gunicorn should allow me to reach 1000's of RPS, just like with Golang. Is this a correct assumption? or am I missing something critical?

abbreviated code:

app = Flask(__name__)

@app.route('/')
def hello():
    return 'hello world!'

Gunicorn conf:

gunicorn -k gevent -w 17  --worker-connections 100000 app:app

Locust load test results. Each 'user' GETs '/' once per 4s

Solution

Answer from authors here: https://github.com/benoitc/gunicorn/issues/305

After another week of debugging, I figured it out! Turns out there is an additional worker type, gevent_pywsgi. Using this worker type increased the throughout roughly 10x, to levels I would consider acceptable.

My testing showed no difference in performance between the sync worker and gevent worker, so I’m still not sure what’s going on there, or what the intent of the gevent worker type is.