Search code examples
linuxbenchmarkingperf

Measure the time to reach the main function using perf?


I would like to measure my program initialization time by measuring the time to reach the main function, so that I get the time of "run-time initializations" (eg. bss sections set to 0 and global constructors called).

How could I use perf to do so?


Solution

  • First, you have to consider that perf doesn't really measure time - it records events. Now you can do some profiling and look at call stacks and derive some information about initialization, but in order to measure a specific time, we need to record the beginning and end timestamp.

    In case of the time to reach the main function, we can use

    1) A dynamic tracepoint on main:

    $ sudo perf probe -x ./gctor main Added new event:   probe_gctor:main  (on main in ./gctor)
    

    You can now use it in all perf tools, such as:

    perf record -e probe_gctor:main -aR sleep
    

    This does require pretty high privileges, I'll just use root in the example.

    2) A sensible point for the "start" of your binary.

    I suggest the tracepoint syscalls:sys_exit_execve. This is basically right after perf record started to execute your binary. This works in my version (5.3.7) - if it doesn't for you, you may need to tinker around. You could of course just use -e cycles, but then you get spammed later on with events you don't want.

    Putting it together:

    sudo perf record -e probe_gctor:main -e syscalls:sys_exit_execve ./gctor
                      ^ this is what perf probe told you earlier
    

    And then look at it with perf script --header

    # time of first sample : 77582.919313
    # time of last sample : 77585.150377
    # sample duration :   2231.064 ms
    [....]
    # ========
    #
               gctor 238828 [007] 77582.919313: syscalls:sys_exit_execve: 0x0
               gctor 238828 [001] 77585.150377:         probe_gctor:main: (5600ea33414d)
    

    You can either compute it from these two samples, or use the sample duration if there are really only the two samples in your trace.

    For completeness: Here's a way to do it with gdb:

    gdb ./gctor -ex 'b main' -ex 'python import time' -ex 'python ts=time.time()' -ex 'run' -ex 'python print(time.time()-ts)'
    

    This is much less accurate, has about 100 ms overhead on my system, but it doesn't require higher privileges. You could of course improve on this by just building your own runner with fork/ptrace/exec in C.