I am looking for a simple explanation of how gperftools works. So far, this is what I have learned:
Besides a general overview, here are some specific questions I would like answered:
I am asking this question to reason about the overhead introduced by using pprof on a Go server.
It is a sampling profiler.
Basically, there are two types of profiling: either you keep track of everything the program does (keeping count of every call, wrapping every function in a timer, in other words, permeating the code with your instruments) or else you let it run itself but just briefly check up on it every now and then (taking samples).
The problem with instrumentation is that it changes the way the program performs. It slows down the program, in a way which also distorts the results. (For example, the production code may be spending too much time waiting for IO, but the instrumented code might not exhibit this.) It also collects far more data than is statistically necessary (if ultimately all you care about is identifying where most time is spent).
By running strace, you can see that Google-perftools works using SIGPROF signals (as do HPCToolkit and Open|SpeedShop). Presumably it just sets up an event handler then lingers in memory, not consuming any CPU cycles, until the hardware/OS interrupts your program (which can be as infrequent as you like), and then presumably it just saves a copy of the call stack (and schedules the next interrupt) before letting control return to your program. The call stack lists what function your program was up to (and which parent function had invoked that, and so, which is how "return" statements work..).