From a high-level perspective, how can I implement an API usage quota system?
In particular, it must fulfill the following requirements:
Are there any generally accepted architectural patterns / algorithms for implementing such systems?
Do you have a database available to your API? If so, simply store a counter in there for each registered account that you want to measure or throttle.
When someone logs on, use a technique like AOP to ensure that each API call will run through your throttling algorithm, which should be simple. Pseudo-code for a 24-hour throttling system:
read access_count from DB
access_count++
if access_count > limit then
respond with something like 429 - Too Many Requests
else
store access_count in DB
end
The above assumes that you have a batch job that walks the DB nightly and clears all the access counters back to 0 for the next day's traffic.
The scalability of this will depend on your DB choice. Any DB could handle this, especially one of the newer NoSQL/NewSQL ones.