When using Azure OpenAI through SDK and RESTful calls, I occasionally encounter throttling and receive 429 http errors. Wonder how Azure OpenAI's rate limiting mechanism works and how to avoid or handle these throttling scenarios?
Azure OpenAI Service quotas and limits are in this link: https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits
For different LLMs, the rate limits vary slightly. The limits consist of two components:
Below is the rate limit for GPT-4 and 4o model's rate limit: