api design-patterns architecture rate-limiting quota

Implementing a large scale API usage quota system

From a high-level perspective, how can I implement an API usage quota system?

In particular, it must fulfill the following requirements:

real-time
fast, not to slow down the API significantly
if using in-memory caches, needs to recover after a sudden shutdown (small loss of quota precision in favor of the API client is OK)
rate limiting (DOS protection)
scaling well

Are there any generally accepted architectural patterns / algorithms for implementing such systems?

Solution

Do you have a database available to your API? If so, simply store a counter in there for each registered account that you want to measure or throttle.

When someone logs on, use a technique like AOP to ensure that each API call will run through your throttling algorithm, which should be simple. Pseudo-code for a 24-hour throttling system:

read access_count from DB
access_count++       
if access_count > limit then
   respond with something like 429 - Too Many Requests
else
   store access_count in DB
end

The above assumes that you have a batch job that walks the DB nightly and clears all the access counters back to 0 for the next day's traffic.

The scalability of this will depend on your DB choice. Any DB could handle this, especially one of the newer NoSQL/NewSQL ones.