Search code examples
pythongunicornsupervisorduvicornstarlette

Uvicorn + Gunicorn + Starlette is getting stuck when serving, can't restart the service without sigkill


I am serving a model on a VM through gunicorn + uvicorn.

It is automatically started by supervisord, running api.sh.

api.sh contains:

source /home/asd/.virtual_envs/myproject/bin/activate

/home/asd/.virtual_envs/myproject/bin/gunicorn --max-requests-jitter 30 -w 6 -b 0.0.0.0:4080 api:app -k uvicorn.workers.UvicornWorker

Without getting too much into api.py, it contains these main parts:

from starlette.applications import Starlette
from models import SomeModelClass


app = Starlette(debug=False)
model = SomeModelClass()


@app.route('/do_things', methods=['GET', 'POST', 'HEAD'])
async def add_styles(request):
    if request.method == 'GET':
        params = request.query_params
    elif request.method == 'POST':
        params = await request.json()
    elif request.method == 'HEAD':
        return UJSONResponse([])

    # Doing things
    result = model(params)
    return UJSONResponse(result)

What happens is that I start getting these errors after the api is up for a few days:

[INFO] Starting gunicorn 20.0.3
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
[ERROR] Connection in use: ('0.0.0.0', 4080)
[ERROR] Retrying in 1 second.
...

Restarting api in supervisord does nothing, I get the same messages as above. The only way I found that works is:

  1. Stop api in supervisord
  2. See which pid is running on 4080 port (a python3.8 process): sudo netstat -tulpn | grep LISTEN
  3. Kill it running kill -9 [PID]
  4. Repeat steps 2-3 for 1-2 times till nothing takes up the 4080 port
  5. Start the api in supervisord

Do you have any ideas how to solve this?


Solution

  • The code actually used Pool from multiprocessing and that is most likely what caused this issue.

    Example:

    from starlette.applications import Starlette
    from models import SomeModelClass
    from multiprocessing import Pool
    from utils import myfun
    
    
    app = Starlette(debug=False)
    model = SomeModelClass()
    
    
    @app.route('/do_things', methods=['GET', 'POST', 'HEAD'])
    async def add_styles(request):
        if request.method == 'GET':
            params = request.query_params
        elif request.method == 'POST':
            params = await request.json()
        elif request.method == 'HEAD':
            return UJSONResponse([])
    
        # Doing things
        result = model(params)
        # Start of the offending code
        pool = Pool(4)
        result = pool.map(myfun, result, chunksize=1)
        # End of the offending code
        return UJSONResponse(result)
    

    The solution for that is to replace multiprocessing with concurrency:

    from starlette.applications import Starlette
    from models import SomeModelClass
    import concurrent.futures
    from utils import myfun
    
    
    app = Starlette(debug=False)
    model = SomeModelClass()
    
    
    @app.route('/do_things', methods=['GET', 'POST', 'HEAD'])
    async def add_styles(request):
        if request.method == 'GET':
            params = request.query_params
        elif request.method == 'POST':
            params = await request.json()
        elif request.method == 'HEAD':
            return UJSONResponse([])
    
        # Doing things
        result = model(params)
        # Start of the fix
        with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
            result = executor.map(myfun, result)
        result = list(result)
        # End of the fix
        return UJSONResponse(result)