Search code examples
node.jsdebianperf

Perf record reporting buffer overflow with no backtrace


I am using perf to profile a node server. The issue is that after I stop the server, perf records a buffer overflow and doesn't write any of the data.

The command

perf record -e cycles:u -g -- npm run start

The result

[ perf record: Woken up 96 times to write data ]
*** buffer overflow detected ***: terminated    

I have tried using valgrind's memcheck to look for buffer overflows, but it doesn't report anything, which means that it may be a static buffer overflow. Is there any way for perf to report where the buffer overflow is occuring?

Edit

Result of running perf under gdb with breakpoint at __stack_chk_fail

Thread 1 "perf" received signal SIGABRT, Aborted.                                                                 
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)              
    at ./nptl/pthread_kill.c:44                                                                                   
44      ./nptl/pthread_kill.c: No such file or directory.                                                         
(gdb) bt                                                                                                          
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)          
    at ./nptl/pthread_kill.c:44                                                                                   
#1  0x00007ffff72d4d2f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 
#2  0x00007ffff7285ef2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26                             
#3  0x00007ffff7270472 in __GI_abort () at ./stdlib/abort.c:79                                                    
#4  0x00007ffff72c92d0 in __libc_message (action=action@entry=do_abort,                                           
    fmt=fmt@entry=0x7ffff73e3210 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155                 
#5  0x00007ffff7361e82 in __GI___fortify_fail (msg=msg@entry=0x7ffff73e31b6 "buffer overflow detected")           
    at ./debug/fortify_fail.c:26                                                                                  
#6  0x00007ffff7360990 in __GI___chk_fail () at ./debug/chk_fail.c:28                                             
#7  0x00005555557e7ddd in memcpy (__len=40, __src=0x555556a28b38, __dest=0x7fffffff843c)                          
    at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29                                                   
#8  write_buildid (fd=0x7fffffff8590, misc=<optimized out>, pid=-1, bid=0x555556a28b38,                           
    name_len=<optimized out>, name=0x555556a28c0c "/opt/pylon/lib/libpylonbase-6.1.1.so") at util/build-id.c:312  
#9  machine__write_buildid_table (machine=machine@entry=0x555555d9bef0, fd=fd@entry=0x7fffffff8590)               
    at util/build-id.c:361                                                                                        
#10 0x00005555557e865e in perf_session__write_buildid_table (session=session@entry=0x555555d9bd00,                
    fd=fd@entry=0x7fffffff8590) at util/build-id.c:374                                                            
#11 0x000055555581c4b9 in write_build_id (ff=ff@entry=0x7fffffff8590, evlist=evlist@entry=0x555555d96d60)         
    at util/header.c:320                                                                                          
#12 0x0000555555824fa3 in do_write_feat (evlist=0x555555d96d60, p=<synthetic pointer>, type=2, ff=0x7fffffff8590) 
    at util/header.c:3224                                                                                         
#13 perf_header__adds_write (fd=3, evlist=0x555555d96d60, header=<optimized out>) at util/header.c:3269           
#14 perf_session__write_header (session=<optimized out>, evlist=0x555555d96d60, fd=3, at_exit=at_exit@entry=true) 
    at util/header.c:3353                                                                                         
#15 0x0000555555760777 in record__finish_output (rec=0x555555b9bb40 <record>) at builtin-record.c:1236            
#16 0x0000555555763560 in __cmd_record (rec=0x555555b9bb40 <record>, argv=<optimized out>, argc=<optimized out>)  
    at builtin-record.c:2026                                                                                      
#17 cmd_record (argc=<optimized out>, argv=<optimized out>) at builtin-record.c:2835                              
#18 0x00005555557dc8a3 in run_builtin (p=p@entry=0x555555ba6cb8 <commands+216>, argc=argc@entry=8,                
    argv=argv@entry=0x7fffffffdb90) at perf.c:312                                                                 
#19 0x000055555574af48 in handle_internal_command (argv=0x7fffffffdb90, argc=8) at perf.c:364                     
#20 run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at perf.c:408                                  
#21 main (argc=8, argv=0x7fffffffdb90) at perf.c:538                                                              

                                                                              

One thing that is interesting is I only get the buffer overflow error when I include a specific C++ add-on that loads a couple of additional shared libraries. If I run the server without that C++ add-on then perf does not report a buffer overflow.


Solution

  • This is a bug in perf. Perf only records the first 20 characters of a build id, with a stack allocated array of 20 characters. One of the libraries I was loading had a buildid that was longer than 20 characters. A bug in the kernel meant that the size would be recorded as longer than 20 characters, and when we tried copying the build id we smash the stack.

    https://lkml.org/lkml/2023/4/26/1064