Search code examples
gopprof

is it ok to use golang pprof on production without effecting performance?


I'm kind of new to the pprof tool, and am wondering if its ok to keep running this in production. From the articles I have seen, it seems to be ok and standard, however I'm confused as to how this does not affect performance since it does a sampling N times every second and how come this does not lead to a degradation in performance.


Solution

  • Jaana Dogan does say in her article "Continuous Profiling of Go programs"

    Profiling in production

    pprof is safe to use in production.
    We target an additional 5% overhead for CPU and heap allocation profiling.

    The collection is happening for 10 seconds for every minute from a single instance. If you have multiple replicas of a Kubernetes pod, we make sure we do amortized collection.
    For example, if you have 10 replicas of a pod, the overhead will be 0.5%. This makes it possible for users to keep the profiling always on.

    We currently support CPU, heap, mutex and thread profiles for Go programs.

    Why?

    Before explaining how you can use the profiler in production, it would be helpful to explain why you would ever want to profile in production. Some very common cases are:

    • Debug performance problems only visible in production.
    • Understand the CPU usage to reduce billing.
    • Understand where the contention cumulates and optimize.
    • Understand the impact of new releases, e.g. seeing the difference between canary and production.
    • Enrich your distributed traces by correlating them with profiling samples to understand the root cause of latency.

    So if you are using pprof for the right reason, yes, you can leave it in production.
    But for basic monitoring, as commented, the system is enough.

    As noted in "Continuous Profiling and Go" by Vladimir Varankin

    Depending on the state of the infrastructure in the company, an “unexpected” HTTP server inside the application’s process can raise questions from your systems operations department ;)

    At the same time, depending on the peculiar nature of a company, the very ability to access something inside a production application, that doesn’t directly relate to application’s business logic, can raise questions from the security department ;)) I

    So the overhead is not the only criteria to consider when leaving active such a feature.