Search code examples
gomemory-leaksout-of-memoryrssgo-gin

Memory leak in Gin Gonic webserver?


Description

I've encountered a potential memory leak in the gin-gonic/gin web server. I have tried replicating the issue by creating a simple /health_check endpoint. The endpoint /health_check is hit every second. The issue results in an out-of-memory (OOM) condition when when the available pod memory is exhausted. There is no other container running in the pod.

I have also exposed pprof and prometheus metrics to understand the issue, but could not find anything. Don't see any other listed issues reporting the same, so I was hoping someone could help me isolate the issue.

I don't see any increase in heap or stack memory, but the process memory keeps increasing. I can see the increase using RSS and corresponding memory block using pmap, but I am not able to trace why the memory doesn't clear or what that allocated memory is used for.

How to reproduce

Simple code example with relevant endpoints:

package server

import (
    "fmt"
    "net/http"
    _ "net/http/pprof"

    "github.com/gin-gonic/gin"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func setupRouter() *gin.Engine {
    router := gin.New()
    router.GET("/health_check", func(c *gin.Context) {
        c.String(http.StatusOK, "Hello World!")
    })
    router.GET("/debug/pprof", gin.WrapF(http.DefaultServeMux.ServeHTTP))
    router.GET("/debug/pprof/:pprofType", gin.WrapF(http.DefaultServeMux.ServeHTTP))
    router.GET("/metrics", func(c *gin.Context) {
        handler := promhttp.Handler()
        handler.ServeHTTP(c.Writer, c.Request)
    })

    return router
}

func Start() {
    router := setupRouter()
    err := router.Run(":8080")
    if err != nil {
        fmt.Printf("Error starting server: %v\n", err)
    }
}
package main

import (
    "example.com/health-check/server"
)

func main() {
    server.Start()
}

Build Command:

go build -ldflags="-s -w" -race -o health-check main.go

Pod Resource Limits:

  1. 64MiB Memory
  2. 1 CPU

Expectations

I expected a consistent memory usage. Some fluctuations are fine, but I expect the memory usage to mostly be consistent and not run Out Of Memory.

Actual result

The Pod crashed predictably due to continuous memory increase in roughly 90 mins by OOM. Each spike in the below image indicates a pod restart because of OOM.

Gin Server Pod Restarts

Go Metrics Resident Memory

Pod Restart Counter

Heap and Stack Usage

Goroutines

goroutine 8334 [running]:
runtime/pprof.writeGoroutineStacks({0x7f0e9e220b20, 0xc000506200})
    /usr/local/go/src/runtime/pprof/pprof.go:703 +0x75
runtime/pprof.writeGoroutine({0x7f0e9e220b20, 0xc000506200}, 0x2)
    /usr/local/go/src/runtime/pprof/pprof.go:692 +0x45
runtime/pprof.(*Profile).WriteTo(0x1509f00, {0x7f0e9e220b20, 0xc000506200}, 0xc?)
    /usr/local/go/src/runtime/pprof/pprof.go:329 +0x1b1
net/http/pprof.handler.ServeHTTP({0xc000051601, 0x9}, {0x7f0e5462fd18, 0xc000506200}, 0x0?)
    /usr/local/go/src/net/http/pprof/pprof.go:267 +0x58a
net/http/pprof.Index({0x7f0e5462fd18?, 0xc000506200}, 0xc000506600)
    /usr/local/go/src/net/http/pprof/pprof.go:384 +0x129
net/http.HandlerFunc.ServeHTTP(0xf53ac8, {0x7f0e5462fd18, 0xc000506200}, 0xc0000b0500?)
    /usr/local/go/src/net/http/server.go:2136 +0x48
net/http.(*ServeMux).ServeHTTP(0xc0004a3740?, {0x7f0e5462fd18, 0xc000506200}, 0xc000506600)
    /usr/local/go/src/net/http/server.go:2514 +0xbd
example.com/health-check/server.setupRouter.WrapF.func4(0xc000506200)
    /go/pkg/mod/github.com/gin-gonic/[email protected]/utils.go:42 +0x97
github.com/gin-gonic/gin.(*Context).Next(...)
    /go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc0001871e0, 0xc000506200)
    /go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620 +0xb91
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc0001871e0, {0x105b2e0?, 0xc00027e540}, 0xc000506600)
    /go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576 +0x425
net/http.serverHandler.ServeHTTP({0xc000600db0?}, {0x105b2e0, 0xc00027e540}, 0xc000506600)
    /usr/local/go/src/net/http/server.go:2938 +0x2a2
net/http.(*conn).serve(0xc000240900, {0x105c478, 0xc0001ad5c0})
    /usr/local/go/src/net/http/server.go:2009 +0xc25
created by net/http.(*Server).Serve in goroutine 1
    /usr/local/go/src/net/http/server.go:3086 +0x80d

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7f0e9ea9d410, 0x72)
    /usr/local/go/src/runtime/netpoll.go:343 +0x85
internal/poll.(*pollDesc).wait(0xc0003a62a0, 0x4ae001?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0003a6280)
    /usr/local/go/src/internal/poll/fd_unix.go:611 +0x405
net.(*netFD).accept(0xc0003a6280)
    /usr/local/go/src/net/fd_unix.go:172 +0x3e
net.(*TCPListener).accept(0xc00024d480)
    /usr/local/go/src/net/tcpsock_posix.go:152 +0x3e
net.(*TCPListener).Accept(0xc00024d480)
    /usr/local/go/src/net/tcpsock.go:315 +0x65
net/http.(*Server).Serve(0xc00054c000, {0x105b520, 0xc00024d480})
    /usr/local/go/src/net/http/server.go:3056 +0x57f
net/http.(*Server).ListenAndServe(0xc00054c000)
    /usr/local/go/src/net/http/server.go:2985 +0xbd
net/http.ListenAndServe(...)
    /usr/local/go/src/net/http/server.go:3239
github.com/gin-gonic/gin.(*Engine).Run(0xc0001871e0, {0xc0003bbef8, 0x1, 0x1})
    /go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:386 +0x38d
example.com/health-check/server.Start()
    /app/server/server.go:49 +0x52
main.main()
    /app/main.go:8 +0x1d

goroutine 82 [IO wait]:
internal/poll.runtime_pollWait(0x7f0e9ea9d318, 0x72)
    /usr/local/go/src/runtime/netpoll.go:343 +0x85
internal/poll.(*pollDesc).wait(0xc0002c60a0, 0xc000568000?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0002c6080, {0xc000568000, 0x1000, 0x1000})
    /usr/local/go/src/internal/poll/fd_unix.go:164 +0x3e5
net.(*netFD).Read(0xc0002c6080, {0xc000568000, 0x1000, 0x1000})
    /usr/local/go/src/net/fd_posix.go:55 +0x4b
net.(*conn).Read(0xc000514010, {0xc000568000, 0x1000, 0x1000})
    /usr/local/go/src/net/net.go:179 +0xad
net/http.(*connReader).Read(0xc0002c4450, {0xc000568000, 0x1000, 0x1000})
    /usr/local/go/src/net/http/server.go:791 +0x2b5
bufio.(*Reader).fill(0xc000536d20)
    /usr/local/go/src/bufio/bufio.go:113 +0x29a
bufio.(*Reader).Peek(0xc000536d20, 0x4)
    /usr/local/go/src/bufio/bufio.go:151 +0xc7
net/http.(*conn).serve(0xc0002e21b0, {0x105c478, 0xc0001ad5c0})
    /usr/local/go/src/net/http/server.go:2044 +0xe7c
created by net/http.(*Server).Serve in goroutine 1
    /usr/local/go/src/net/http/server.go:3086 +0x80d

goroutine 8335 [IO wait]:
internal/poll.runtime_pollWait(0x7f0e9ea9d128, 0x72)
    /usr/local/go/src/runtime/netpoll.go:343 +0x85
internal/poll.(*pollDesc).wait(0xc0002c62a0, 0xc000600dc1?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0002c6280, {0xc000600dc1, 0x1, 0x1})
    /usr/local/go/src/internal/poll/fd_unix.go:164 +0x3e5
net.(*netFD).Read(0xc0002c6280, {0xc000600dc1, 0x1, 0x1})
    /usr/local/go/src/net/fd_posix.go:55 +0x4b
net.(*conn).Read(0xc0000b04d0, {0xc000600dc1, 0x1, 0x1})
    /usr/local/go/src/net/net.go:179 +0xad
net/http.(*connReader).backgroundRead(0xc000600db0)
    /usr/local/go/src/net/http/server.go:683 +0x83
created by net/http.(*connReader).startBackgroundRead in goroutine 8334
    /usr/local/go/src/net/http/server.go:679 +0x246

Environment

  • go version: go1.21.1 linux/amd64
  • docker image: 1.21.1-bullseye
  • gin version (or commit ref): github.com/gin-gonic/gin v1.9.1
  • operating system: Debian GNU/Linux 11 (bullseye)
  • GOGC=10;GOMEMLIMIT=64MiB (tried with default values as well, but same result)

Any help or guidance is appreciated, thanks a lot.


Solution

  • So I was running some experiments to figure out from where the memory was increasing.

    I tried testing without the -race build flag, and the memory seems to be fine now. I don't see it increasing continuously on an idle server (serving only liveness, readiness probes and metrics endpoint).

    I am not entirely sure why this is happening or if this is expected, but I will look more into it, and have removed the flag for now from our production deployments.

    Sharing the memory trends after removing this flag (build command being go build), the spikes are load tests that I was running, firing 1M requests (100 in parallel at a time):

    Following is the pod memory usage: Pod Memory Usage

    Following is the RSS: RSS

    Following is the Heap and Stack memory usage: Heap and Stack

    PS: I am marking this as the answer for now, but please feel free to correct me, I am still pretty new at golang, thanks a lot to everyone for contributing. If I find anything contrary to this result down the line I will update/remove this answer, thanks.