I'm trying to run Gemini batch prediction on GCP exactly as described here https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini
me@YSJs-MacBook-Pro ~ % curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/<my-project-id>/locations/us-central1/batchPredictionJobs"
My request.json is identical to what is suggested in the documentation, and points to a basic jsonl file that I've copied over from the docs as well.
When I run the above command, I face the following error
{
"error": {
"code": 429,
"message": "The following quota metrics exceed quota limits: aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs",
"status": "RESOURCE_EXHAUSTED"
}
}
I have several different GCP projects but I get this error across all of them. I've also tried different regions in the request URL, with the same error result.
I would like to submit a request to increase my quota/rate-limit but I cannot see this particular metric (or indeed anything related to Gemini batch prediction) on the Quotas & System Limits page https://console.cloud.google.com/iam-admin/quotas
What is the best way to move forward?
This was fixed by Google today and we no longer face the error.
It was not an actual resource exhaustion issue. Appears to have been a bug on their side.