kubernetes load-balancing google-cloud-run

Invoking CloudRun endpoint from within itself

Assuming there is a Flask web server that has two routes, deployed as a CloudRun service over GKE.

@app.route('/cpu_intensive', methods=['POST'], endpoint='cpu_intensive')
def cpu_intensive():
    #TODO: some actions, cpu intensive

@app.route('/batch_request', methods=['POST'], endpoint='batch_request')
def batch_request():
    #TODO: invoke cpu_intensive

A "batch_request" is a batch of many same structured requests - each one is highly CPU intensive and handled by the function "cpu_intensive". No reasonable machine can handle a large batch and thus it needs to be paralleled across multiple replicas. The deployment is configured that every instance can handle only 1 request at a time, so when multiple requests arrive CloudRun will replicate the instance. I would like to have a service with these two endpoints, one to accept "batch_requests" and only break them down to smaller requests and another endpoint to actually handle a single "cpu_intensive" request. What is the best way for "batch_request" break down the batch to smaller requests and invoke "cpu_intensive" so that CloudRun will scale the number of instances?

make http request to localhost - doesn't work since the load balancer is not aware of these calls.
keep the deployment URL in a conf file and make a network call to it?

Other suggestions?

Solution

I believe for this use case it's much better to use GKE rather than Cloud Run. You can create two kubernetes deployements one for the batch_request app and one for the cpu_intensive app. the second one will be used as worker for the batch_request app and will scale on demand when there are more requests to the batch_request app. I believe this is called master-worker architecture in which you separate your app front from intensive work or batch jobs.