Search code examples
apidesign-patternsarchitecturerate-limitingquota

Implementing a large scale API usage quota system


From a high-level perspective, how can I implement an API usage quota system?

In particular, it must fulfill the following requirements:

  • real-time
  • fast, not to slow down the API significantly
  • if using in-memory caches, needs to recover after a sudden shutdown (small loss of quota precision in favor of the API client is OK)
  • rate limiting (DOS protection)
  • scaling well

Are there any generally accepted architectural patterns / algorithms for implementing such systems?


Solution

  • Do you have a database available to your API? If so, simply store a counter in there for each registered account that you want to measure or throttle.

    When someone logs on, use a technique like AOP to ensure that each API call will run through your throttling algorithm, which should be simple. Pseudo-code for a 24-hour throttling system:

    read access_count from DB
    access_count++       
    if access_count > limit then
       respond with something like 429 - Too Many Requests
    else
       store access_count in DB
    end
    

    The above assumes that you have a batch job that walks the DB nightly and clears all the access counters back to 0 for the next day's traffic.

    The scalability of this will depend on your DB choice. Any DB could handle this, especially one of the newer NoSQL/NewSQL ones.