Azure OpenAI service rate limit 429 error

When using Azure OpenAI through SDK and RESTful calls, I occasionally encounter throttling and receive 429 http errors. Wonder how Azure OpenAI's rate limiting mechanism works and how to avoid or handle these throttling scenarios?

Solution

Azure OpenAI Service quotas and limits are in this link: https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits

For different LLMs, the rate limits vary slightly. The limits consist of two components:

Quota Limit in token tokens per minute (TPM)
Total Requests count per minute

Below is the rate limit for GPT-4 and 4o model's rate limit:

GPT-4 and 4o model's rate limit