How to benchmark (track CPU usage of) a short-lived command?

While I'm aware of commands such as top (with a number of equivalents listed here), I'm not clear on how to capture the CPU usage of a "short-lived" process. For example, if I wanted to see the performance of the ls command, what could I do that would sample the load measurement frequently enough and fast enough to be of use?

Most existing answers I've seen on this topic use a loop that repeats something like top every n seconds, which isn't applicable for quick / short-lived commands, especially given that I won't have time to see the PID in time to feed it to said techniques. I might be able to use something from this answer since it seems to be sampling at pretty low timescales, but I suspect there's a more direct / less intense approach.

Solution

If you can instrument every run of your short-lived command, you can measure wall-clock, user-CPU, and system-CPU time with time ls.

Or for more details, there's perf stat ls. Depending on your sysctl kernel.perf_event_paranoid = 0 setting, you can use HW perf counters to measure CPU cycles / cache misses / etc. in kernel code as well as user-space. It also does software events like page faults. But for very short-lived commands like ls, perf will have significant startup overhead. On Intel CPUs, ocperf.py is a wrapper for perf with more events. See Can x86's MOV really be "free"? Why can't I reproduce this at all? for an example of using ocperf.py for an asm microbenchmark.

strace -c ls will count time spent in system calls.

If you can't run each short command under a measurement wrapper, a system-wide perf record -a might work.