Golang WebServer from scratch: how should I achieve rate limiting?

I'm developing a webserver in Golang to put in practice what I'm learning about the language and its concurrency model.

I have a simple raw socket's based webserver which answers to a base path with a sample html response. Internally, the webserver listens on port 8000 and in a loop accepts incoming connections forwarding them to a buffered channel with max-capacity of 1000 pending connections. At the same time, a pool of 20 workers handle the requests in the buffered channel and write back the response.

Now, initially my webservers main Serve method went like this:

func (h HttpServer) Serve(host, path string) {
    
    connection, err := net.Listen("tcp", "0.0.0.0:8000")
    if err != nil { ... }
    
    for true {
        clientConnection, err := connection.Accept()
        if err != nil { ... }

        select {
        case h.connectionsQueue <- clientConnection:
        default:
            errors.TooManyRequestsHandler{}.Handle(clientConnection)
        }
    }
}

With webserver being this struct:

var maxWorkers = 20
var maxPendingConnections = 1000

type HttpServer struct {
    connectionsQueue chan net.Conn
}

func BuildHttpServer() HttpServer {
    routeDispatcher := routing.MakeRegisterRoute()
    routeDispatcher.RegisterRoute("/", ui.RouteUIIndex{})

    server := HttpServer{
        connectionsQueue: make(chan net.Conn, maxPendingConnections),
    }

    for i := 0; i < maxWorkers; i++ {
        go handler.SpawnHandler(server.connectionsQueue, routeDispatcher)
    }

    return server
}

In practice, this already achieved the behaviour of accepting all incoming connections until the maxPendingConnections is reached / channel full. If the workers are overwhelmed, 429 Too Many Requests starts being returned to the client through the TooManyRequestsHandler, which writes that response back.

But what if I want to set an absolute upper-bound to the rate at which requests are dealt in this webserver? The objective here would be to guarantee predictable performance and avoid degradation. I've thus changed my Serve function to:

func (h HttpServer) Serve(host, path string) {
acceptRequestRateTicker := time.NewTicker(200 * time.Microseconds)

connection, err := net.Listen("tcp", "0.0.0.0:8000")
if err != nil { ... }

for true {
    select {
    case <-acceptRequestRateTicker.C:
        clientConnection, err := connection.Accept()
        if err != nil { ... }

        select {
        case h.connectionsQueue <- clientConnection:
        default:
            errors.TooManyRequestsHandler{}.Handle(clientConnection)
        }
    }
}

The point here being that the main goroutine does not accept a higher request rate than 5000 req/s, by accepting a connections every 200 microseconds, after which clients will experience request timeouts in obtaining a connection to the server. Is this a good strategy for guaranteeing predictable service performance and expectations?

Solution

So, after a while I've achieved what I wanted and the simple solution is to implement a token-based rate limiter.

The basic idea is simple, you have a bucket of depth N containing tokens. Each time a request needs to get processed, you retrieve one of the tokens available if any, reducing the number of available tokens by 1.

If none are available, you have two choices, either respond immediately with 429 Too Many Requests or queue the incoming request for processing only when tokens will be available.

Between the two choices lies different reasons for why a rate limiter was implemented. A) You have it in place to control the performance bounds under which your application runs at a steady state. B) You have it in place due to a contract on requests per second a clients can hit your API.

Not queueing requests and answering immediately with 429 Too Many Requests is suitable for enforcing B). Instead, for A) clients will probably prefer their request to be server with a delay than to receive no response at all, so queueing rate limited requests is the right choice, up to a certain point given by the memory constraints of your application.

In any case, the trick of the token algorithm is in controlling the rate at which tokens become available once again. If we want to achieve a rate limiting of 300 req/s, we would like a goroutine to replenish a single token on a non-full bucket every 3.33 ms (1000 ms / 300 req/s). That is, regardless of how the incoming requests are consuming the bucket, replenishing occurs at fixed intervals, 300 times a second, or every 3.33ms. The purpose of the bucket size is to allow bursts of requests to be properly accepted while still enforcing the overall rate.

I have achieved this with the following logic:

http_server.go:

const (
    MAX_WORKERS int = 1
)

type HttpServer struct {
    rateLimiter *limiter.Limiter
}

func BuildHttpServer() HttpServer {
    server := HttpServer{
        rateLimiter: limiter.MakeRateLimiter(),
    }

    for i := 0; i < MAX_WORKERS; i++ {
        go handler.SpawnHandler(server.rateLimiter.AcceptedConnectionsQueue)
    }

    return server
}

func (h HttpServer) Serve(host, path string) {
    connection, err := net.Listen("tcp", "0.0.0.0:8000")
    if err != nil { /* ... */ }

    for true {
        clientConnection, err := connection.Accept()
        if err != nil { /* ... */ }

        if proceed, err := h.rateLimiter.ProceedOrBufferConnection(clientConnection); err != nil {
            /* err != nil means connection was rate limited
             * but could not be buffered
             */
            consumer.Consumer{}.ConsumeAndRespond(clientConnection, responses.TooManyRequestsResponse{})
            continue
        } else if !proceed {
            /* proceed equals false means connection 
             * was rate limited
             */
            continue
        }

        select {
        case h.rateLimiter.AcceptedConnectionsQueue <- clientConnection:
        default:
            /* reaching this case means our workers
             * are not able to keep up with the rate at 
             * which we accept connections. You should detect
             * this scenario and increase
             * the number of workers or the 
             * accepted connections buffer size
             */
            consumer.Consumer{}.ConsumeAndRespond(clientConnection, responses.TooManyRequestsResponse{})
        }
    }
}

rate_limiter.go:

const (
    TOKENS_DEPTH_SIZE                int = 1
    ACCEPTED_CONNECTIONS_BUFFER_SIZE int = 20
    PENDING_CONNECTIONS_BUFFER_SIZE  int = 2000
)

type Limiter struct {
    tokensBucketDepth        int
    pendingConnectionsQueue  chan net.Conn
    AcceptedConnectionsQueue chan net.Conn
    tokensMutex              sync.Mutex
}

func MakeRateLimiter() *Limiter {
    limiter := Limiter{
        tokensBucketDepth:        TOKENS_DEPTH_SIZE,
        pendingConnectionsQueue:  make(chan net.Conn, PENDING_CONNECTIONS_BUFFER_SIZE),
        AcceptedConnectionsQueue: make(chan net.Conn, ACCEPTED_CONNECTIONS_BUFFER_SIZE),
        tokensMutex:              sync.Mutex{},
    }
    
    go Refill(&limiter)

    return &limiter
}

func (l *Limiter) ProceedOrBufferConnection(conn net.Conn) (bool, error) {
    l.tokensMutex.Lock()
    if l.tokensBucketDepth > 0 {
        // we have a token, proceed
        l.tokensBucketDepth--
        l.tokensMutex.Unlock()

        return true, nil
    }

    l.tokensMutex.Unlock()

    /* we did not have a token, try to queue
     * the connection in the pending buffer
     */
    select {
    case l.pendingConnectionsQueue <- conn:
    default:
        /* our pending buffer is full, there's nothing
         * we can do here, we should return Too Many Requests
         */
        return false, errors.New("buffer is full, message should be discarded")
    }
    return false, nil
}
    
func Refill(l *Limiter) {
    ticker := time.NewTicker(3333 * time.Microsecond)
    for {
        select {
        case <-ticker.C:
            l.tokensMutex.Lock()
            if l.tokensBucketDepth < TOKENS_DEPTH_SIZE {
                select {
                case conn := <-l.pendingConnectionsQueue:
                    select {
                    case l.AcceptedConnectionsQueue <- conn:
                    default:
                        select {
                        case l.pendingConnectionsQueue <- conn:
                            l.tokensBucketDepth++
                        default:
                            consumer.Consumer{}.ConsumeAndRespond(conn, responses.TooManyRequestsResponse{})
                        }
                    }
                default:
                    l.tokensBucketDepth++
                }
            }

            l.tokensMutex.Unlock()
        default:
        }
    }
}

Notice how the limiter starts with a single token in this scenario. This means we enforce the rate right from the very first token and queue immediately in case of bursts. You might want to play around with this property.

Running this, here are the results with hey:

hey -n 2000 -c 4 -q 1000 -m GET http://localhost:8000/ -t 1

This sends 2000 requests, divided through 4 workers at 1000 req/s rate.

Instead, the results are:

Summary:
  Total:    6.6374 secs
  Slowest:  0.0376 secs
  Fastest:  0.0001 secs
  Average:  0.0132 secs
  Requests/sec: 301.3217

  Total data:   58000 bytes
  Size/request: 29 bytes

Response time histogram:
  0.000 [1] |
  0.004 [23]    |
  0.008 [5] |
  0.011 [9] |
  0.015 [1941]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.019 [7] |
  0.023 [9] |
  0.026 [2] |
  0.030 [1] |
  0.034 [0] |
  0.038 [2] |


Latency distribution:
  10% in 0.0131 secs
  25% in 0.0132 secs
  50% in 0.0133 secs
  75% in 0.0134 secs
  90% in 0.0136 secs
  95% in 0.0136 secs
  99% in 0.0176 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0004 secs, 0.0001 secs, 0.0376 secs
  DNS-lookup:   0.0002 secs, 0.0000 secs, 0.0071 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0004 secs
  resp wait:    0.0128 secs, 0.0001 secs, 0.0375 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0007 secs

Status code distribution:
  [200] 2000 responses

As you've seen, we have thus achieved an upper bound of request processing at 300 req/s.

But if now we half the refill window to every 1.667 ms, we get:

Summary:
  Total:    3.3454 secs
  Slowest:  0.0196 secs
  Fastest:  0.0015 secs
  Average:  0.0067 secs
  Requests/sec: 597.8337

Increasing our rate two-fold.

Link to the complete repo: https://github.com/miguelpais/go-http-server-and-cli

Hope this helps and please do criticise my solution.