python event-handling openshift event-listener gradio

Gradio app triggers event listener second time after 60 minutes

I have a Gradio app where you can upload a CSV file, make predictions for all entities in the CSV file and download the results after the process has finished.

If I run predictions for smaller and medium-sized CSV files which take <60 minutes to process, everything works just fine. If it takes longer though, the function that usually gets triggered by the upload button (make_predictions()) automatically seems to get triggered a second time after quite exactly 60 minutes. Thus, the file is being processed a second time while the first run hasn’t even finished. Soon after that, the whole process is being killed (probably due to memory issues, which wouldn't be surprising).

The logs look like this:

22. Jan. 2024, 12:00:53.212 Started make_predictions() for /tmp/gradio/a97ef283367a497e8647575689f49df7ed48a8cb/400000_template.csv
22. Jan. 2024, 12:00:55.269 Reading done
22. Jan. 2024, 12:01:15.997 Cleaning done
22. Jan. 2024, 12:02:16.098 Prediction started
22. Jan. 2024, 12:02:16.098 Predicting batch number 1/49 ...
22. Jan. 2024, 12:04:09.573 Predicting batch number 2/49 ...
22. Jan. 2024, 12:05:52.201 Predicting batch number 3/49 ...
22. Jan. 2024, 12:07:16.456 Predicting batch number 4/49 ...
22. Jan. 2024, 12:08:39.085 Predicting batch number 5/49 ...
(...)
22. Jan. 2024, 12:55:16.467 Predicting batch number 36/49 ...
22. Jan. 2024, 12:56:48.801 Predicting batch number 37/49 ...
22. Jan. 2024, 12:58:23.105 Predicting batch number 38/49 ...
22. Jan. 2024, 12:59:53.941 Predicting batch number 39/49 ...
22. Jan. 2024, 13:00:54.516 Started make_predictions() for /tmp/gradio/a97ef283367a497e8647575689f49df7ed48a8cb/400000_template.csv
22. Jan. 2024, 13:01:01.219 Reading done
22. Jan. 2024, 13:01:32.600 Cleaning done
22. Jan. 2024, 13:01:39.905 Predicting batch number 40/49 ...
22. Jan. 2024, 13:03:13.334 Prediction started
22. Jan. 2024, 13:03:13.334 Predicting batch number 1/49 ...
22. Jan. 2024, 13:03:23.797 INFO: Started server process [1]
22. Jan. 2024, 13:03:23.797 INFO: Waiting for application startup.
22. Jan. 2024, 13:03:23.797 INFO: Application startup complete.
22. Jan. 2024, 13:03:23.797 INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

The Gradio app runs in a Docker container which is being deployed on OpenShift via ArgoCD. Also, I’ve installed the following packages:

Package                   Version
------------------------- ----------
aiofiles                  23.2.1
aiohttp                   3.8.5
aiosignal                 1.3.1
altair                    5.1.1
annotated-types           0.5.0
anyio                     3.7.1
async-timeout             4.0.3
attrs                     23.1.0
certifi                   2023.7.22
charset-normalizer        3.2.0
click                     8.1.7
contourpy                 1.1.0
cycler                    0.11.0
exceptiongroup            1.1.3
fastapi                   0.101.0
ffmpy                     0.3.1
filelock                  3.12.3
fonttools                 4.42.1
frozenlist                1.4.0
fsspec                    2023.6.0
gradio                    3.40.1
gradio_client             0.5.0
h11                       0.14.0
httpcore                  0.17.3
httpx                     0.24.1
huggingface-hub           0.16.4
idna                      3.4
importlib-resources       6.0.1
Jinja2                    3.1.2
jsonschema                4.19.0
jsonschema-specifications 2023.7.1
kiwisolver                1.4.5
linkify-it-py             2.0.2
markdown-it-py            2.2.0
MarkupSafe                2.1.3
matplotlib                3.7.2
mdit-py-plugins           0.3.3
mdurl                     0.1.2
multidict                 6.0.4
numpy                     1.25.2
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
orjson                    3.9.5
packaging                 23.1
pandas                    1.5.3
Pillow                    10.0.0
pip                       22.2.2
pydantic                  2.3.0
pydantic_core             2.6.3
pydub                     0.25.1
pyparsing                 3.0.9
python-dateutil           2.8.2
python-multipart          0.0.6
pytz                      2023.3
PyYAML                    6.0.1
referencing               0.30.2
regex                     2023.8.8
requests                  2.31.0
rpds-py                   0.10.0
semantic-version          2.10.0
setuptools                53.0.0
six                       1.16.0
sniffio                   1.3.0
starlette                 0.27.0
tokenizers                0.13.3
toolz                     0.12.0
torch                     1.13.1
tqdm                      4.66.1
transformers              4.25.1
typing_extensions         4.7.1
uc-micro-py               1.0.2
urllib3                   2.0.4
uvicorn                   0.23.2
websockets                11.0.3
wheel                     0.42.0
yarl                      1.9.2
zipp                      3.17.0

It seems like the reason for my problem is some sort of configuration/setting/behaviour of Gradio, because one time during the prediction process, I lost the connection to the Gradio interface due to internet problems, but the process in my container kept running. The event wasn't triggered again after 60 minutes and the whole process finished as it should.

Unfortunately, I wasn't able to further narrow the exact cause so far. What could the reason for this behaviour be?

Update: Apparently, Gradio is not the (only?) reason for my problem. Running the same code on localhost instead of OpenShift works just fine – no unusual occurrences after 60 minutes.

Solution

I still don’t quite understand exactly why this behaviour occurs, but I’ve found out that calling Gradio’s .queue() function (see https://www.gradio.app/docs/blocks#blocks-queue for gradio.Blocks() and https://www.gradio.app/docs/interface#interface-queue for gradio.Interface()) has solved the problem.