python nginx deep-learning gunicorn yolo

Yolo v5 use all nginx cores for single request

I developed a API(Flask based) that uses Yolo v5. I used Ngnix and Gunicorn to serve this service. Everything works fine on just one request. It does not matter if I give 10 core CPU or 50 Core CPU, only one request will be answered at a time.

The weights are all loaded outside the request and only the loaded weights are used at the time of request

weights = "./x.pt"
imgsz = 640
device = ""
set_logging()
device = select_device(device)
# Load model
model = attempt_load(weights, map_location=device)  # load FP32 model
stride = int(model.stride.max())  # model stride
imgsz = check_img_size(imgsz, s=stride)  # check img_size
# Second-stage classifier
classify = False
if classify:
    modelc = load_classifier(name='resnet101', n=2)  # initialize
    modelc.load_state_dict(torch.load('resnet101.pt', map_location=device)['model']).to(device).eval()



@application.route("URL", methods=['POST'])
def XXX():
        ...
        ...
        ...

I'd be very grateful for any suggestion. Thanks and also sorry about my English.

Solution

the problem was solved. I always set the number of Workers(in Gunicorn setting) equal to the number of CPU cores. there was problem. when I set the number of Workers to 1, problem was solved.

file address (centos):

 /etc/systemd/system/X.service

changed to:

WorkingDirectory=X
Environment="PATH=X"
ExecStart=X/venv/bin/gunicorn --workers 1 --timeout 200 --bind unix:X.sock -m 007 run