Search code examples
google-cloud-platformgemini

Gemini batch prediction 429 RESOURCE_EXHAUSTED


I'm trying to run Gemini batch prediction on GCP exactly as described here https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini

me@YSJs-MacBook-Pro ~ % curl -X POST \ 
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/<my-project-id>/locations/us-central1/batchPredictionJobs"

My request.json is identical to what is suggested in the documentation, and points to a basic jsonl file that I've copied over from the docs as well.

When I run the above command, I face the following error

{
  "error": {
    "code": 429,
    "message": "The following quota metrics exceed quota limits: aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs",
    "status": "RESOURCE_EXHAUSTED"
  }
}

I have several different GCP projects but I get this error across all of them. I've also tried different regions in the request URL, with the same error result.

I would like to submit a request to increase my quota/rate-limit but I cannot see this particular metric (or indeed anything related to Gemini batch prediction) on the Quotas & System Limits page https://console.cloud.google.com/iam-admin/quotas

What is the best way to move forward?


Solution

  • This was fixed by Google today and we no longer face the error.

    It was not an actual resource exhaustion issue. Appears to have been a bug on their side.