I'm developing a webserver in Golang to put in practice what I'm learning about the language and its concurrency model.
I have a simple raw socket's based webserver which answers to a base path with a sample html response. Internally, the webserver listens on port 8000 and in a loop accepts incoming connections forwarding them to a buffered channel with max-capacity of 1000 pending connections. At the same time, a pool of 20 workers handle the requests in the buffered channel and write back the response.
Now, initially my webservers main Serve method went like this:
func (h HttpServer) Serve(host, path string) {
connection, err := net.Listen("tcp", "0.0.0.0:8000")
if err != nil { ... }
for true {
clientConnection, err := connection.Accept()
if err != nil { ... }
select {
case h.connectionsQueue <- clientConnection:
default:
errors.TooManyRequestsHandler{}.Handle(clientConnection)
}
}
}
With webserver being this struct:
var maxWorkers = 20
var maxPendingConnections = 1000
type HttpServer struct {
connectionsQueue chan net.Conn
}
func BuildHttpServer() HttpServer {
routeDispatcher := routing.MakeRegisterRoute()
routeDispatcher.RegisterRoute("/", ui.RouteUIIndex{})
server := HttpServer{
connectionsQueue: make(chan net.Conn, maxPendingConnections),
}
for i := 0; i < maxWorkers; i++ {
go handler.SpawnHandler(server.connectionsQueue, routeDispatcher)
}
return server
}
In practice, this already achieved the behaviour of accepting all incoming connections until the maxPendingConnections is reached / channel full. If the workers are overwhelmed, 429 Too Many Requests starts being returned to the client through the TooManyRequestsHandler, which writes that response back.
But what if I want to set an absolute upper-bound to the rate at which requests are dealt in this webserver? The objective here would be to guarantee predictable performance and avoid degradation. I've thus changed my Serve function to:
func (h HttpServer) Serve(host, path string) {
acceptRequestRateTicker := time.NewTicker(200 * time.Microseconds)
connection, err := net.Listen("tcp", "0.0.0.0:8000")
if err != nil { ... }
for true {
select {
case <-acceptRequestRateTicker.C:
clientConnection, err := connection.Accept()
if err != nil { ... }
select {
case h.connectionsQueue <- clientConnection:
default:
errors.TooManyRequestsHandler{}.Handle(clientConnection)
}
}
}
The point here being that the main goroutine does not accept a higher request rate than 5000 req/s, by accepting a connections every 200 microseconds, after which clients will experience request timeouts in obtaining a connection to the server. Is this a good strategy for guaranteeing predictable service performance and expectations?
So, after a while I've achieved what I wanted and the simple solution is to implement a token-based rate limiter.
The basic idea is simple, you have a bucket of depth N containing tokens. Each time a request needs to get processed, you retrieve one of the tokens available if any, reducing the number of available tokens by 1.
If none are available, you have two choices, either respond immediately with 429 Too Many Requests or queue the incoming request for processing only when tokens will be available.
Between the two choices lies different reasons for why a rate limiter was implemented. A) You have it in place to control the performance bounds under which your application runs at a steady state. B) You have it in place due to a contract on requests per second a clients can hit your API.
Not queueing requests and answering immediately with 429 Too Many Requests is suitable for enforcing B). Instead, for A) clients will probably prefer their request to be server with a delay than to receive no response at all, so queueing rate limited requests is the right choice, up to a certain point given by the memory constraints of your application.
In any case, the trick of the token algorithm is in controlling the rate at which tokens become available once again. If we want to achieve a rate limiting of 300 req/s, we would like a goroutine to replenish a single token on a non-full bucket every 3.33 ms (1000 ms / 300 req/s). That is, regardless of how the incoming requests are consuming the bucket, replenishing occurs at fixed intervals, 300 times a second, or every 3.33ms. The purpose of the bucket size is to allow bursts of requests to be properly accepted while still enforcing the overall rate.
I have achieved this with the following logic:
http_server.go:
const (
MAX_WORKERS int = 1
)
type HttpServer struct {
rateLimiter *limiter.Limiter
}
func BuildHttpServer() HttpServer {
server := HttpServer{
rateLimiter: limiter.MakeRateLimiter(),
}
for i := 0; i < MAX_WORKERS; i++ {
go handler.SpawnHandler(server.rateLimiter.AcceptedConnectionsQueue)
}
return server
}
func (h HttpServer) Serve(host, path string) {
connection, err := net.Listen("tcp", "0.0.0.0:8000")
if err != nil { /* ... */ }
for true {
clientConnection, err := connection.Accept()
if err != nil { /* ... */ }
if proceed, err := h.rateLimiter.ProceedOrBufferConnection(clientConnection); err != nil {
/* err != nil means connection was rate limited
* but could not be buffered
*/
consumer.Consumer{}.ConsumeAndRespond(clientConnection, responses.TooManyRequestsResponse{})
continue
} else if !proceed {
/* proceed equals false means connection
* was rate limited
*/
continue
}
select {
case h.rateLimiter.AcceptedConnectionsQueue <- clientConnection:
default:
/* reaching this case means our workers
* are not able to keep up with the rate at
* which we accept connections. You should detect
* this scenario and increase
* the number of workers or the
* accepted connections buffer size
*/
consumer.Consumer{}.ConsumeAndRespond(clientConnection, responses.TooManyRequestsResponse{})
}
}
}
rate_limiter.go:
const (
TOKENS_DEPTH_SIZE int = 1
ACCEPTED_CONNECTIONS_BUFFER_SIZE int = 20
PENDING_CONNECTIONS_BUFFER_SIZE int = 2000
)
type Limiter struct {
tokensBucketDepth int
pendingConnectionsQueue chan net.Conn
AcceptedConnectionsQueue chan net.Conn
tokensMutex sync.Mutex
}
func MakeRateLimiter() *Limiter {
limiter := Limiter{
tokensBucketDepth: TOKENS_DEPTH_SIZE,
pendingConnectionsQueue: make(chan net.Conn, PENDING_CONNECTIONS_BUFFER_SIZE),
AcceptedConnectionsQueue: make(chan net.Conn, ACCEPTED_CONNECTIONS_BUFFER_SIZE),
tokensMutex: sync.Mutex{},
}
go Refill(&limiter)
return &limiter
}
func (l *Limiter) ProceedOrBufferConnection(conn net.Conn) (bool, error) {
l.tokensMutex.Lock()
if l.tokensBucketDepth > 0 {
// we have a token, proceed
l.tokensBucketDepth--
l.tokensMutex.Unlock()
return true, nil
}
l.tokensMutex.Unlock()
/* we did not have a token, try to queue
* the connection in the pending buffer
*/
select {
case l.pendingConnectionsQueue <- conn:
default:
/* our pending buffer is full, there's nothing
* we can do here, we should return Too Many Requests
*/
return false, errors.New("buffer is full, message should be discarded")
}
return false, nil
}
func Refill(l *Limiter) {
ticker := time.NewTicker(3333 * time.Microsecond)
for {
select {
case <-ticker.C:
l.tokensMutex.Lock()
if l.tokensBucketDepth < TOKENS_DEPTH_SIZE {
select {
case conn := <-l.pendingConnectionsQueue:
select {
case l.AcceptedConnectionsQueue <- conn:
default:
select {
case l.pendingConnectionsQueue <- conn:
l.tokensBucketDepth++
default:
consumer.Consumer{}.ConsumeAndRespond(conn, responses.TooManyRequestsResponse{})
}
}
default:
l.tokensBucketDepth++
}
}
l.tokensMutex.Unlock()
default:
}
}
}
Notice how the limiter starts with a single token in this scenario. This means we enforce the rate right from the very first token and queue immediately in case of bursts. You might want to play around with this property.
Running this, here are the results with hey:
hey -n 2000 -c 4 -q 1000 -m GET http://localhost:8000/ -t 1
This sends 2000 requests, divided through 4 workers at 1000 req/s rate.
Instead, the results are:
Summary:
Total: 6.6374 secs
Slowest: 0.0376 secs
Fastest: 0.0001 secs
Average: 0.0132 secs
Requests/sec: 301.3217
Total data: 58000 bytes
Size/request: 29 bytes
Response time histogram:
0.000 [1] |
0.004 [23] |
0.008 [5] |
0.011 [9] |
0.015 [1941] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.019 [7] |
0.023 [9] |
0.026 [2] |
0.030 [1] |
0.034 [0] |
0.038 [2] |
Latency distribution:
10% in 0.0131 secs
25% in 0.0132 secs
50% in 0.0133 secs
75% in 0.0134 secs
90% in 0.0136 secs
95% in 0.0136 secs
99% in 0.0176 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0001 secs, 0.0376 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0071 secs
req write: 0.0000 secs, 0.0000 secs, 0.0004 secs
resp wait: 0.0128 secs, 0.0001 secs, 0.0375 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0007 secs
Status code distribution:
[200] 2000 responses
As you've seen, we have thus achieved an upper bound of request processing at 300 req/s.
But if now we half the refill window to every 1.667 ms, we get:
Summary:
Total: 3.3454 secs
Slowest: 0.0196 secs
Fastest: 0.0015 secs
Average: 0.0067 secs
Requests/sec: 597.8337
Increasing our rate two-fold.
Link to the complete repo: https://github.com/miguelpais/go-http-server-and-cli
Hope this helps and please do criticise my solution.