Search code examples
azureazure-openai

Azure OpenAI service rate limit 429 error


When using Azure OpenAI through SDK and RESTful calls, I occasionally encounter throttling and receive 429 http errors. Wonder how Azure OpenAI's rate limiting mechanism works and how to avoid or handle these throttling scenarios?


Solution

  • Azure OpenAI Service quotas and limits are in this link: https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits

    For different LLMs, the rate limits vary slightly. The limits consist of two components:

    • Quota Limit in token tokens per minute (TPM)
    • Total Requests count per minute

    Below is the rate limit for GPT-4 and 4o model's rate limit:

    GPT-4 and 4o model's rate limit