linux go performance-testing cpu-usage analysis

Program to collect CPU usage data for performance analysis

I wrote a complex program in Go (which uses many concurrency constructs). I would like to make an accurate analysis of my program's CPU usage but I don't know where to start. In particular, I would like to obtain useful information on:

Maximum number of goroutines (i.e. concurrent threads) running at the same time;
How much CPU usage changes if I run multiple instances of the same program at the same time?
Stack utilization (it tells me if I use a lot (or a few) of the stack based on how many nested functions I engage);

I work in Linux Ubuntu 18.04.1 LTS. What should I do to get this information? Is there any program (maybe specific for Golang) that allows to obtain this information?

Solution

Well, that's a complex topic so there cannot be a single definitive answer.

Actually, what you came close to, is called "collection of metrics" or "telemetry" in production settings.

In most cases, the collection of metrics uses sampling approach: that is, a snapshot of the system state of interest is collected and sent somewhere. "Somewhere" is usually some system which allows to persist the values of the metrics somewhere, and also usually provides various ways to analyze them.

In the simplest case, the ananysis is done by glaring at graphs drawn from the collected data in some sort of the UI. More complex cases include alerting when the value of some metric raises above (or drops below) some threshold.

A single metric is some named value of a particular type.

The metrics can be produced from different sources of data. The sources typical to a reasonably common setups in which programs written in Go run include:

The Go runtime itself.

This includes things like the number of goroutines and the garbage collection stats—the measurements impossible to get outside of the running Go program for obvious reasons.
The measurements provided by the OS about the running process which executes your program.

This includes things like the CPU time spent in the user and system contexts of the kernel, the memory consumption—as seen by the OS, the number of opened file (and socket) descriptors, number of CPU context switches, disk I/O stats and so on.
The measurements provided by the containerization software running the container containing the program.

On Linux this is usually provided by the cgroup subsystem which is chiefly responsible for controlling of the resource limits imposed onto a process hierarchy.

How exactly to turn the data from these data sources is an open question (and that's why it's unfit for the SO format).

For instance, to collect Go runtime stats you may use the expvar mechanism—as suggested by @Adrian,—and periodically poll the HTTP endpoint provided by it for data.

Or you may run an internal goroutine in your program which periodically grabs these data from the runtime and pushes it somewhere.

Sampling of the OS-level process-related data, again, can be done in different ways. Say, you may collect them from your very program using something like github.com/shirou/gopsutil/process and push them along with the metrics gathered from the runtime stats, or you may use one or more of myriads of tools to collect this data externally.

(The most low-tech but accessible way of gathering the OS-level performance data I know of is using tools like pidstat, iotop, atop, cpustat etc).

The question of persisting and analyzing the collected data is, again, open.

For a start, it may be as simple as dumping everything into a structured file—may be with a timestamp on each record—and process it with anything you like—for instance, pyplot or RRD-tool or R or…whatever.

Or you may reach for a big gun right from the start and send your metrics to graphite or graphana or zabbix or icinga or whatever currently is at the top of its hip curve.