I have a web app and a background worker service running in Cloud Run.
The main app calls the background worker which is essentially just an rq
worker wrapped in a thin Flask app to adhere to the runtime contract. The rq
worker is spawned via subprocess.Popen
.
I do not block the main thread with the Popen
call, and return a response immediately. However, the instance still seems to die after 15 minutes of processing.
Per the documentation, it appears this workflow should be supported so long as there is some sort of CPU processing going on (it isn't exactly clear):
If you want to support background activities in your Cloud Run service, set your Cloud Run service CPU to be always allocated so you can run background activities outside of requests and still have CPU access.
Another article says:
Note that even if CPU is always allocated, Cloud Run autoscaling is still in effect, and may terminate container instances if they aren't needed to handle incoming traffic. An instance will never stay idle for more than 15 minutes after processing a request unless it is kept active using minimum instances.
This 15-minute limit seems to be what I'm encountering despite the CPU certainly not being "idle" in any sense of the word.
The particular background jobs I am spawning could potentially take 1 - 2 hours in some extreme cases, so blocking the main thread, not returning a response until completion, and increasing the request timeout would not work as maxes out at 1 hour (not to mention it's prone to error).
Is there a way to make this work without moving toward GKE or hacky Cloud Build workarounds?
EDIT - Some additional details
Worker service configuration:
Here is the server that spawns the rq worker:
import os
import subprocess
from flask import Flask, Response
from http import HTTPStatus
app = Flask(__name__)
@app.route("/")
def index():
subprocess.Popen(["rq", "worker", "--burst", "--url", os.getenv("REDIS_URL"), "queue"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
return Response(status=HTTPStatus.OK)
The Dockerfile for which just runs the following command after setting things up:
gunicorn -w 1 --timeout 0 -b 0.0.0.0:8080 app:app
The logs do not yield anything particularly useful because I never use communicate()
or check the output of the Popen
call to avoid blocking the main thread. I am left with just the gunicorn logs as a result, which isn't ideal:
2023-11 Update:
Google now supports this type of workflow with Cloud Run Jobs without the need to run an HTTP server. You may run a container until it exits for up to 24 hours with custom parameters / arguments.
Jobs docs: https://cloud.google.com/run/docs/execute/jobs
Previous answer:
Using gunicorn --timeout 0
and blocking the thread to avoid sending an HTTP response works for up to 60 minutes of processing time (i.e. the highest request timeout allowed by Cloud Run). You must configure this on Cloud Run as the default is 5 minutes.
This is not an ideal solution, though:
The longer the timeout is, the more likely the connection can be lost due to failures on the client side or the Cloud Run side. When a client re-connects, a new request is initiated and the client isn't guaranteed to connect to the same container instance of the service.
https://cloud.google.com/run/docs/configuring/request-timeout
Otherwise, GKE or Compute Engine work for this type of workload.