While I'm aware of commands such as top
(with a number of equivalents listed here), I'm not clear on how to capture the CPU usage of a "short-lived" process. For example, if I wanted to see the performance of the ls
command, what could I do that would sample the load measurement frequently enough and fast enough to be of use?
Most existing answers I've seen on this topic use a loop that repeats something like top
every n seconds, which isn't applicable for quick / short-lived commands, especially given that I won't have time to see the PID in time to feed it to said techniques. I might be able to use something from this answer since it seems to be sampling at pretty low timescales, but I suspect there's a more direct / less intense approach.
If you can instrument every run of your short-lived command, you can measure wall-clock, user-CPU, and system-CPU time with time ls
.
Or for more details, there's perf stat ls
. Depending on your sysctl kernel.perf_event_paranoid = 0
setting, you can use HW perf counters to measure CPU cycles / cache misses / etc. in kernel code as well as user-space. It also does software events like page faults. But for very short-lived commands like ls
, perf
will have significant startup overhead. On Intel CPUs, ocperf.py
is a wrapper for perf
with more events. See Can x86's MOV really be "free"? Why can't I reproduce this at all? for an example of using ocperf.py
for an asm microbenchmark.
strace -c ls
will count time spent in system calls.
If you can't run each short command under a measurement wrapper, a system-wide perf record -a
might work.