Search code examples
google-perftoolsgperftoolspprof

How does gperftools work under the hood?


I am looking for a simple explanation of how gperftools works. So far, this is what I have learned:

  • It runs a stop-the-world sampler. In other words, it periodically stops the program being profiled to collect information.
  • Golang's pprof library uses gperftools underneath.

Besides a general overview, here are some specific questions I would like answered:

  • Is gperftools an "event based profiler" or "instrumentation profiler". From what I understand, these profilers modify the way a program runs and collect samples via those modifications
  • At what 'level' in the OS does gperftools profile? Does it profile the kernal like SystemTap or perf?
  • Is gperftools safe to run on a high-traffic production server?

I am asking this question to reason about the overhead introduced by using pprof on a Go server.


Solution

  • It is a sampling profiler.

    Basically, there are two types of profiling: either you keep track of everything the program does (keeping count of every call, wrapping every function in a timer, in other words, permeating the code with your instruments) or else you let it run itself but just briefly check up on it every now and then (taking samples).

    The problem with instrumentation is that it changes the way the program performs. It slows down the program, in a way which also distorts the results. (For example, the production code may be spending too much time waiting for IO, but the instrumented code might not exhibit this.) It also collects far more data than is statistically necessary (if ultimately all you care about is identifying where most time is spent).

    By running strace, you can see that Google-perftools works using SIGPROF signals (as do HPCToolkit and Open|SpeedShop). Presumably it just sets up an event handler then lingers in memory, not consuming any CPU cycles, until the hardware/OS interrupts your program (which can be as infrequent as you like), and then presumably it just saves a copy of the call stack (and schedules the next interrupt) before letting control return to your program. The call stack lists what function your program was up to (and which parent function had invoked that, and so, which is how "return" statements work..).