python gunicorn supervisord uvicorn starlette

Uvicorn + Gunicorn + Starlette is getting stuck when serving, can't restart the service without sigkill

I am serving a model on a VM through gunicorn + uvicorn.

It is automatically started by supervisord, running api.sh.

api.sh contains:

source /home/asd/.virtual_envs/myproject/bin/activate

/home/asd/.virtual_envs/myproject/bin/gunicorn --max-requests-jitter 30 -w 6 -b 0.0.0.0:4080 api:app -k uvicorn.workers.UvicornWorker

Without getting too much into api.py, it contains these main parts:

from starlette.applications import Starlette
from models import SomeModelClass


app = Starlette(debug=False)
model = SomeModelClass()


@app.route('/do_things', methods=['GET', 'POST', 'HEAD'])
async def add_styles(request):
    if request.method == 'GET':
        params = request.query_params
    elif request.method == 'POST':
        params = await request.json()
    elif request.method == 'HEAD':
        return UJSONResponse([])

    # Doing things
    result = model(params)
    return UJSONResponse(result)

What happens is that I start getting these errors after the api is up for a few days:

[INFO] Starting gunicorn 20.0.3
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
...

Restarting api in supervisord does nothing, I get the same messages as above. The only way I found that works is:

Stop api in supervisord
See which pid is running on 4080 port (a python3.8 process): sudo netstat -tulpn | grep LISTEN
Kill it running kill -9 [PID]
Repeat steps 2-3 for 1-2 times till nothing takes up the 4080 port
Start the api in supervisord

Do you have any ideas how to solve this?

Solution

The code actually used Pool from multiprocessing and that is most likely what caused this issue.

Example:

from starlette.applications import Starlette
from models import SomeModelClass
from multiprocessing import Pool
from utils import myfun


app = Starlette(debug=False)
model = SomeModelClass()


@app.route('/do_things', methods=['GET', 'POST', 'HEAD'])
async def add_styles(request):
    if request.method == 'GET':
        params = request.query_params
    elif request.method == 'POST':
        params = await request.json()
    elif request.method == 'HEAD':
        return UJSONResponse([])

    # Doing things
    result = model(params)
    # Start of the offending code
    pool = Pool(4)
    result = pool.map(myfun, result, chunksize=1)
    # End of the offending code
    return UJSONResponse(result)

The solution for that is to replace multiprocessing with concurrency:

from starlette.applications import Starlette
from models import SomeModelClass
import concurrent.futures
from utils import myfun


app = Starlette(debug=False)
model = SomeModelClass()


@app.route('/do_things', methods=['GET', 'POST', 'HEAD'])
async def add_styles(request):
    if request.method == 'GET':
        params = request.query_params
    elif request.method == 'POST':
        params = await request.json()
    elif request.method == 'HEAD':
        return UJSONResponse([])

    # Doing things
    result = model(params)
    # Start of the fix
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        result = executor.map(myfun, result)
    result = list(result)
    # End of the fix
    return UJSONResponse(result)