I used Perf
to extract call graphs in an evince
benchmark. The command used is as follows:
sudo perf record -d --call-graph dwarf -c 10000 -e mem_load_uops_retired.l3_miss:uppp /opt/evince-3.28.4/bin/evince
I extracted the backtraces using Perf script
and found out that there are many corrupted backtrace instances. Some contained repeated redundant function calls, such as this:
EvJobScheduler 10021 8653.926478: 100 mem_load_uops_retired.l3_miss:uppp: 7fffd1062a00 5080022 N/A|SNP N/A|TLB N/A|LCK N/A
7ffff4b07207 tcache_get+0x197 (inlined)
7ffff4b07207 __GI___libc_malloc+0x197 (inlined)
7fffd9872fb9 gmalloc+0x59 (inlined)
7fffd9872fb9 gmallocn+0x59 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd9872fb9 gmallocn+0x59 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd9951e6f _ZN8TextLine8coalesceEP10UnicodeMap+0xff (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd9952f82 _ZN9TextBlock8coalesceEP10UnicodeMapd+0x752 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd995bc37 _ZN8TextPage8coalesceEbdb+0x1507 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffd995cb71 _ZN13TextOutputDev7endPageEv+0x31 (/usr/lib/x86_64-linux-gnu/libpoppler.so.73.0.0)
7fffe803c6d2 _ZL26poppler_page_get_text_pageP12_PopplerPage+0x92 (/usr/lib/x86_64-linux-gnu/libpoppler-glib.so.8.9.0)
7fffe803deb3 poppler_page_get_selection_region+0x63 (/usr/lib/x86_64-linux-gnu/libpoppler-glib.so.8.9.0)
7fffe82ab650 [unknown] (/opt/evince-3.28.4/lib/evince/4/backends/libpdfdocument.so)
7ffff795f165 ev_job_page_data_run+0x2f5 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
7ffff7961309 ev_job_thread+0xe9 (inlined)
7ffff7961309 ev_job_thread_proxy+0xe9 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
7ffff5492194 g_thread_proxy+0x54 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff4e686da start_thread+0xda (/lib/x86_64-linux-gnu/libpthread-2.27.so)
7ffff4b9188e __GI___clone+0x3e (inlined)
There are two consecutive gmallocs()
which is not correct. Some were function calls were callers and callees did not match, such as this:
evince 10015 8640.962182: 100 mem_load_uops_retired.l3_miss:uppp: 7fffdc005030 5080022 N/A|SNP N/A|TLB N/A|LCK N/A
7ffff5a3275f g_action_get_enabled+0x3f (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
7ffff5a2ffcc g_simple_action_group_query_action+0x3c (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
7ffff7130d8c gtk_action_muxer_query_action+0xac (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7130d4c gtk_action_muxer_query_action+0x6c (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7130d4c gtk_action_muxer_query_action+0x6c (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7130d4c gtk_action_muxer_query_action+0x6c (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7130d4c gtk_action_muxer_query_action+0x6c (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7130994 gtk_action_muxer_action_added+0x64 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff71309e2 gtk_action_muxer_action_added+0xb2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff71309e2 gtk_action_muxer_action_added+0xb2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff71309e2 gtk_action_muxer_action_added+0xb2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff71309e2 gtk_action_muxer_action_added+0xb2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7131482 gtk_action_muxer_set_parent+0x1d2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff73c46f8 gtk_widget_set_parent+0x198 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff73d0228 gtk_window_set_titlebar+0xb8 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
55555558d391 ev_window_init+0x2e1 (/opt/evince-3.28.4/bin/evince)
7ffff57699c4 g_type_create_instance+0x1e4 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff574a747 g_object_new_internal+0x2e7 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff574c5bf g_object_new_valist+0x3cf (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff574c938 g_object_new+0x98 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
55555558f192 ev_window_new+0x42 (/opt/evince-3.28.4/bin/evince)
5555555778f4 ev_application_open_recent_view+0x14 (/opt/evince-3.28.4/bin/evince)
555555573733 load_files+0x473 (inlined)
555555573733 main+0x473 (/opt/evince-3.28.4/bin/evince)
7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)
This is wrong because:
1) gtk_action_muxer_action_added()
calls g_action_group_query_action()
at offset 0x64
(i.e., 100
in decimal):
0x00000000000f898f <+95>: push %rax
0x00000000000f8990 <+96>: callq 0x84e20 <g_action_group_query_action@plt>
0x00000000000f8995 <+101>: test %eax,%eax
2) gtk_action_muxer_query_action()
calls g_action_group_query_action()
at offset 0x6c
(i.e., 108
in decimal):
0x00000000000f8d45 <+101>: mov %rbp,%rsi
0x00000000000f8d48 <+104>: callq 0x84e20 <g_action_group_query_action@plt>
0x00000000000f8d4d <+109>: pop %rdx
3) gtk_action_muxer_query_action()
calls g_action_group_query_action()
at offset 0xac
(i.e., 172
in decimal):
0x00000000000f8d85 <+165>: mov %r12,%rdx
0x00000000000f8d88 <+168>: callq 0x84e20 <g_action_group_query_action@plt>
0x00000000000f8d8d <+173>: pop %rsi
As it can be seen, none of these match the reported backtrace. Does the problem occur when Perf
tries to match function names to addresses, or the addresses are wrong, by themselves? How can I revise the problem?
Inlining and optimized tail calls are 2 separate ways for calls in the C source to not appear in the runtime stack backtrace.
An inlined function will result in the calls to its children happening directly from the parent it inlined into, with this function not appearing at all ever.
Multiple levels of inlining are of course possible, and common with small functions.
A function that ends with an optimized jmp
tailcall makes it look like the callee was called directly from its parent. Instead of creating a new stack frame below its own, it tears down its own stack frame and jumps to the target function, with the return address still there. So that new function's stack frame will use the same memory that was previously used by the function that made the tailcall.
To put it another way, any call func; ret
sequence can be replaced by jmp func
. This works even if the jmp target is a (plt stub for) a library function, or an indirect call/jmp *printf@GOTPCREL(%rip)
target for gcc -fno-plt
style codegen.
See also https://en.wikipedia.org/wiki/Tail_call
e.g.
void leaf(); // some other function
void foo() { stuff; ...; leaf(); } // ends with a tailcall
void bar() { stuff; foo(); stuff; }
An event that fires while foo
is doing "stuff" will have a backtrace of bar->foo.
With no optimization, an event that fires in "leaf" would have a backtrace like bar->foo->leaf.
With an optimized tailcall, it would be bar->leaf, no foo because foo just jumped to leaf, forwarding its return address to leaf
so when leaf
eventually returns it will be directly to bar
.
This works with args and return values, especially register args. Not always with stack args, for example it's impossible if leaf
had more stack args than foo
.